Gremlins in the Subaru

Have you ever felt that the electrical and electronics gremlins are after your car? Have you ever wondered if there were a conspiracy at play? I did today.

For 18 months our dearest Subaru Outback did not have a working remote control. It did not bother much during the Summer but it was a painful experience during the Winter. We persisted. I took it to the dealer and even they said it is not worth the money we’d spend on it. So off I went, and mind you this was the most complex task I’ve tackled, trying to figure out what was going on.

The first obvious step was to change the cell. A single CR1620 off of Amazon. Didn’t work. Next I tried with a second remote and also changed the cell. Nothing. So that told me that, since the probability of two remotes and a single cell being bad is minuscule, it had to be in the car.

Enter the Gremlins

I then started off hunting the problem.

 First major set of checks:

I checked the usual and easy to check things like reprogramming the two remotes, and door jamb switches. No problems there either. That meant, tearing into the guts of the beast. This was a major head ache because it meant I needed to decipher the electrical circuits, port numbers, and colours. Things like this

I’d take a badly documented source code any given day over these diagrams that tend to span over multiple pages. I had to bite it, so I did. Poring over the diagrams, I was helped by my trust Multimeter. Served me well.

 Second major set of checks

The signal receiver unit; Subaru calls it Keyless Control Unit (KCU). Now I had no way of checking if the unit itself was healthy so I did the next best thing. Check the connections going in for continuity, for GND, and for voltage. That circuit was fine though I could not test the KCU itself. BTW this unit was tucked away under the passenger dash and the only way to reach it is to disassemble the whole bloody centre console. No way am I doing that. So instead I checked the terminals by disconnecting from the side and that saved me a ton of time both in tearing it apart and in putting it back together. I used a long nose plier with some electric tape on it to pull the connection. Something like this:

 

A closer look:

 Third major set of checks

The Body Control Module (BCM). This took me down the rabbit hole because I was presented with this

I didn’t see any splicing in the wiring diagrams. So is this nonsense? Anyway. I traced it all the way around to this

What could it possibly be? There’s nothing in the wiring diagrams and it is being held together by zip ties. For sure this is an aftermarket part. So I did the best and used Google Vision API. That didn’t help. Couldn’t read the text and the bloody thing had me hanging upside down and put strain on my lower back. I tried using putty model to get a negative and that helped a little with the text though it was still not very clear. So then I posted on Reddit. /u/mospo was the man who saved the day. It was an aftermarket alarm system. I wanted to rule that out too so I disconnected it. I then walked through the rest of the wiring diagram. Nothing.

So there I sat, dejected that all the diagnostics in the world did not fix the problem. There were two pieces I could not confirm yet – the BCM and the KCU. I had no way to confirm these but to go back to the dealer and get a Subaru testing unit and diagnostics tool. So there the car sat, half the guts opened up, nuts and bolts all over the place. Since my wifey needed the car, I ended up putting it all together, the problem not being solved.

We moved to Denver

In the meanwhile we moved to Denver and this diagnostic took the backseat. This was quite honestly the most difficult thing for me to solve. I had to walk through the circuits figure things out in the diagrams and all I had to show was everything was in spec. I even posted to Reddit, again.

 Micro Center visit

Now the main difference between Sioux Falls and Denver is that I can walk into an electronics store like Micro Center and pick up electronics. I can walk into Harbor Freight and pick up some tools. So there I was picking up a Raspberry Pi Zero and a Corsair mouse pad from Micro Center when I walked past an aisle with CR1620 cells on sale for $0.99. Not bad. Might as well pick two. Anyway, I came home and tossed the cells in the corner. They sat for a couple of days. I then decided, might as well give it a try. So I plopped one of the cells into my Fob… a drumroll and … nothing. It was a good try, me still clicking the unlock button on the fob. $0.99 x 2 is not too bad, me still clicking the unlock button on the fob. These gremlins have been bothering me, me still clicking the unlock button on the fob. I wonder where the nearest Subaru dealer would be, me clicking the panic button; literally and figuratively.

A thar she blows her alarm! I knew how Dr Frankenstein felt when the monster awakened because I heard the sweet sweet sound of a panic button setting off the car alarm. It was quite the therapy session. The rest of the buttons worked too. It just so happens that the unlock button doesn’t make a big of a sound so I wasn’t hearing it earlier.

So… what happened here?

It feels weird that the solution this this problem was a stupid button cell and while I am delighted that the remote works, I am left wondering what happened here. There are two potential answers I can think of:

  1. The CR1620 cell I bought on Amazon was dead on arrival. This is the most plausible explanation as I checked the old cell with a voltmeter and it recorded something like 2.6V when it should be closer to 3V.
  2. I worked on other mechanical fixes that involved the Indian Mjolinir aka a big hammer. I doubt this is the case because the electricals are generally well put together
  3. Jesus?

Case of the mysterious slow down

About 1 month ago my Alienware suddenly slowed down. Like really really bad slow down. To the point where Firefox would take ages to open the google.com landing page.

Now I had attributed this to Firefox itself being slow as was reported elsewhere but something did not feel right. Switching windows (Alt+Tab) to a regular terminal would also be terribly slow. Like ~5secs to just switch. This certainly cannot be just be Firefox tossing it up, can it?

1. Rule things in and rule things out
So I decided to dig. First off “top” indicated ~100% CPU consumption. So something’s consuming all of the 4 cores. It was always Firefox. I tried different websites, including Youtube, WordPress, Reddit. It produced similar results. So I decided to try Chrome and it was darn near similar. Then tried other applications. I deliberate avoided Libreoffice as it might genuinely be a slow application. VLC tends to be low on CPU but reasonably high on GPU consumption. Even VLC had a similar behaviour. What this told me was that the whole system was running dog slow.

2. So is it a CPU problem or something else?
To test this theory I checked if it was just the CPU or if the GPU was also behaving similarly. A quick OpenGL program with a dull CPU usage but high fragment shader usage ruled the GPU out. The CPU was still pegging although not all the cores. So then I checked the disk I/O and it was fairly inexistent. No issue with RAM consumption as it was always <50%.

3. So is it malicious?
Around the same time I was reading about compromised systems. This bothered me as it could mean something really really nasty. I know, I made a leap based on nothing but it was worth ruling out. Thankfully Linux Mint makes this rather simple. My /home is on a separate partition so all it meant was wiping out the /, reinstalling the OS, creating a dummy account, and retesting. Took about 20 mins to do all of this. Nope still slow. So either this was really really deep or I need to stop heading down this path. I decided to stop.

4. Is it hardware?
Power settings could lead to this kind of a behaviour. So would heat protection if it decided to kick in. I checked the cpu governer and forced set it to “performance” mode. Along side I opened up the laptop and cleaned up all the dust because the CPU temperature was hovering at 65C. Not too hot yet, but might as well bring it down. This definitely helped and brought it down to 45C on peak 100% count. I was prepared to open the CPU heat sink up and apply a dab of thermal grease but decided against it as it is better to first verify if the heating issue was from constricted airflow or if it was from thermal flow resistance. Anyway not opening up the heatsink was the right call.

5. Is it still the hardware?
Now that i ruled almost everything else out, I wanted to pay closer attention to the CPU itself. Since neither Firefox nor Chrome are applications that would have serious performance bugs, I wondered if there was something else going on. This is when I started paying closer attention to the cpufreq toolchain.

I paid close attention to the output.
$ cpufreq-info

analyzing CPU 3:
driver: intel_pstate
CPUs which run at the same hardware frequency: 3
CPUs which need to have their frequency coordinated by software: 3
maximum transition latency: 0.97 ms.
hardware limits: 800 MHz – 2.90 GHz
available cpufreq governors: performance, powersave
current policy: frequency should be within 800 MHz and 2.90 GHz.
The governor “powersave” may decide which speed to use
within this range.
current CPU frequency is 870 MHz.

AT the same time I noticed that the CPU was pegged at 100%.

Here comes the AHA! moment. While the OS correctly recognised the CPU frequency range, it was reporting the current CPU frequency to be 870MHz. Way below what it should be set to given the 100% consumption. That got me thinking. What could this mean? I tried to force set the frequency to 2.90 GHz but it would ignore this. This was the second AHA! moment. So the OS doesn’t seem to have the ability to set the frequency. Now who could do that? The CPU itself, the power settings, the thermal settings or something else? I’ve ruled the CPU being a problem, the power settings, and the thermal settings purely by coincidence during the earlier steps. So what could it be? Maybe something lower than the OS – the BIOS.

6. It is the hardware, but not the way I thought.
So I went into the BIOS, checked he Intel Speed Step and disabled it. Restarted. No change. I then decided to just follow IT Crowd philosophy. I reset to defaults.

7. Eureka!
… AND I got a warning when I started the laptop. What I saw was this message: http://www.dell.com/support/article/us/en/19/SLN148385
But I was using the Alienware adapter. What the hell is it on about? None of the resolutions listed on the page made any sense. No wait.
“The AC adapter is not fully connected to the system.” Huh? Are you essentially saying a loose contact? Wait what?
So I started twisting the cable and restarting to see if it made any difference. And it did! If I held the wire straight out I didn’t see the warning message. So inspired by MacGyver I came up with a way to hold it in place. Also reminded me of the TV antenna vaastu orientation.

You can see how it looks now at the end of the post.

8. And we have a winner!
With the binder clips holding the cable and no warnings in my face, I restarted the system. It worked. It damn worked. Everything was back to the speeds to which I was accustomed. Now granted this whole binder clip setup is a hack. The true problem, quite possibly, is that I have a loose solder joint somewhere.

In Summary.
1. MacGyverism is a way of life
2. Thermal characteristics hurt performance. Spring cleaning is valuable
3. Never ignore warnings. Hardware or Compiler doesn’t matter
4. Not all Eureka moments involve water, a bath tub, and a naked guy running out into the streets

 

MacGyver

The Car is an Engine

Over the past few weeks I worked on my 2008 Nissan Maxima. While I had changed the spark plugs late in 2015, I never thought about publishing a post on the experience.

I am a Software Engineering Manager and a Programmer. One of the patterns in writing programs is Object Oriented Programming. Almost every book on OOPS starts off with the Car, Vehicle, and Engine analogy. Hence my interest in posting this. Here I am listing a few lessons I learned from fixing my car. Read this keeping Software Engineering in mind and you’ll see a completely different spectrum.

  1. Use the manuals for truth. Use other sources for context. Use a wrench for experience. I rely on the Nissan manual for a Maxima. It contains the absolute truth as seen by the Maxima designers. However, forums like My6thGen have been an unbelievably valuable sources for research. Pictures posted by members have in many cases helped me figure out the orientations. Eric the Car Guy, for me, is the John Carmack of cars. However the best teacher is holding that wrench and getting to work, much like in Programming. This became evident for torque specs, for the angles my fingers could not fit, and the fact that I had to create chimeras out of my limited tools.
  2. Plan, Plan, Prepare, and Execute. Following the manuals and the videos is good but at the end of the day you need the tools, materials, and the parts to get the job done. In many cases I had to order the parts from online stores like Rockauto and Courtesy Parts. That meant shipping and it inturn meant preparing slots of 3-4 hrs over the weekend. This in turn meant I needed to clear weekends off of grocery shopping. It is extremely important to plan the parts, the tools, and fluids into the schedule. Once you have the parts it is time to prepare and this becomes critical. Prepare yourself with the steps that need to be executed before you execute them. Unlike programming an undo or revert generally involves a tow truck and that is painful and expensive. Once we start executing the only way is forward. There is no undo and I cannot stress this enough. This became extremely important when I was replacing the Clockspring/Spiral cable. Since the job involved taking out the air bag on the steering wheel, I had to pay extreme attention to discharging the battery, ensuring I am always electrically grounded, and storing the air bag in a safe place. I ended up researching, planning, and rehearsing for close to 3 days before I even started the job.
  3. Following the earlier point, all the planning in the world won’t prepare you for the surprises. Accept it, and deal with it when they occur. Working on cars, or for that matter software, bring experience. The difference between an newbie (I was) and a pro (I am not yet) is that a pro commits far fewer mistakes than a newbie. A pro will also be able to quickly adjust to surprises that jump at you. One such experience I had was when I was trying to replace the regulator on my driver side window. There was a bolt I had to put back and it was painful to get it into the slot. My fingers wouldn’t reach in, and the socket wouldn’t hold it due to the angle. Worse yet, when the bolt fell into the door assembly it was hard to retrieve it. So the second time the bolt fell into the door chamber I stepped back and had to solve this problem. I tried with a magnet but that wouldn’t work as the door is made of metal. Finally I solved this by using tissue paper. Apparently this is a well known trick, so yeah I reinvented the wheel. It felt good.
  4. Tools. Oh God! Tools. Invest in a good collection. I own a Craftsman set. I then expanded it by buying a few extensions, swivels, spark plug socket, and a few hex bits. The number of times I’ve missed a ratcheting wrench has been fairly limited but I know that when it becomes painful I’ll buy a set. Tools generally make you efficient. As I mentioned in the earlier point, not all tools are manufactured or ready made. Some, like the tissue paper trick, come from experience. Do not be defined by your tools. You can swear by Craftsman or Snap-On but on any given day all they do is turn a bolt only when you use the them.
  5. Contingency plans. Have fun without contingency plans. And then have fun when an event defies you Plan A, Plan B,… Plan Z extended to UTF-8. All the plans you make, including contingency plans, will be shattered. The ones that did not get shattered are the ones you had planned for. Keeping this in mind make sure to think things through. For example when I was changing the CVT fluid, the brand new oil collection pan I bought as Plan A, had its drain hole plugged. Not knowing this I ended up with some fluid on the floor due to the oil overflowing. I had a Plan B and it involved cardboard sheets on the floor. Plan C involved rolls of tissues. What finally happened? The oil was hot and I could not get everything covered in time. Since some of the oil ended up on the floor, I had to find out how to fix the mess. Turns out Kitty litter is an answer. A generous amount of degreaser, and a trip to Walmart to get some Kitty litter fixed the problem. The floor is clean. Had I not made the Plan B and Plan C, I’d have been spending a lot more time cleaning the garage floor.
  6. DIY for best results. I can relate to this as I am a manager. There are too many instances where I was frustrated with the code or the output but at the end of it, if I want something done to my satisfaction I should be ready to do it myself. Else it is a conscious trade-off between priorities. Sometime ago when I was visiting my brother during the Winter, my Maxima had its driver side window just roll down and not roll up. My brother did not have any reasonable tools on hand and didn’t have a garage to work in. So I did the next best thing – took it a mechanic. Two infact. $200 later, I still had a non functioning window. Since I had to drive back I ended up sticking the glass with Gorilla tape and painters tape. The drive back was a nightmare for 10 hrs with the wind howling in my left ear. I can relate to Gollum now. Anyway at the end of it, once I returned home I ordered the parts from Rockauto and fixed it in 1 hr. The correct way. Which loops back to the first point.

I hope this little write up, taken in two contexts, conveys my philosophy in working on cars and in writing code. I’ll conclude this by saying that my brother would rather buy a car someone else can fix easily and is willing to pay the price for it. You know, not everyone wants to be a mechanic and not everyone wants to be a programmer.

 

Credits/Thanks to:

Lameo from work for being a great teacher

My wife for Kitty litter trick

My Son for making me think through every step by explaining it to him

/u/kowalski71

John Carmack for well you know

Linux Mint (Cinnamon) multi display error

Failed to apply configuration: %s
GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SettingsDaemon was not provided by any .service files

My wife started getting this error on her login. Typical symptoms also include no text displayed on the terminal, no font displayed on icon names and other displays not receiving any signal inspite of being listed as connected, and the login session actually using both the displays. The last point was a hint for my final solution.
Anyway, I tried a bunch of things but the most important thing was to check the displays themselves.
$ xrandr --current
Should show that the displays are actually connected. IF this doesn’t show then proceeding might not yield any results.
I tried a bunch of other things including updating nvidia drivers and messing with the nvidia settings. On a hunch, since the errors were from user configuration, I started poking around the user configuration files. I finally deleted the configuration directories themselves and just let Mint/Gnome recreate these with defaults. That worked!
Here’s what I deleted in the user’s home directory, as a Fibonacci sequence (in order):
1. .cinnamon
2. .dbus
3. .linuxmint
4. .gnome2
5. .gnome2_private
6. .config
7. .gconf

The run that worked for me had all the 7 directories deleted. I cannot verify what combination actually worked but definitely the 7 deleted together worked.
My guess is that either #6 or #7 fixed it for me. If anyone else is facing this problem could you delete the directories in the reverse order and post here.

Eclipse, SDL and linking HELL

I am using Linux Mint 64 bit system.
I started writing some base code using SDL2 and I wanted to try out Eclipse; you know IDE and stuff instead of good old VI. The default way I use to include and link SDL2 to my application is to pass `sdl2-config –cflags` to the CXXFLAGS and `sdl2-config –libs` to the LDFLAGS.
Eclipse, in all its glory gave me a way to pass the include flags.
Project->Properties->C/C++Build ->Settings->GCC C++ Compiler->Miscellaneous-> “Other Flags” and just append `sdl2-config –cflags`
This works.
Then comes the linking part because I kept getting SDL_Init “unresolved symbol” and no matter what I tried it did not work. After a lot of fighting I realised that
Project->Properties->C/C++Build ->Settings->GCC C++ Linker->Miscellaneous->Linker Flags -Xlinker provide the linking files *BEFORE* the .o files are linked. What I needed was a way to provide the `sdl2-config –libs` *AFTER* all the .o files are listed. So for anyone out there the solution I have for now is to append `sdl2-config –libs` to the value in
Project->Properties->C/C++Build ->Settings->GCC C++ Linker-> “Expert settings: Command line pattern”

Very inconsistent. I hope this gets fixed soon.
Ofcourse if anyone thinks there is a better way to do this, I’d welcome comments

The case of mysterious frame drops

I play Team Fortress 2 on a regular basis and I own a monster of a system for what TF2 requires (read: Alienware M14X). I never really had any problem with running the game at a minimum of 60fps on the highest settings. And this has been the case for the past 3 years or so.

So about a month ago when I started seeing weird behaviour and specifically massive framedrops I wondered what was going on. It so happened that at the same time there was a patch from Valve that messed up the game. It caused crashes for a few, dropped frame rate for a others and straight up rendering artifacts for many others. I had the latter two checked on my system.

This coincidence was the first thing that threw me off. I was genuinely bothered by the bugs on TF2 but I decided to leave the game alone for a week or so. So when I logged back and I saw a patch I was happy because it was my saving grace, right? Wrong. While it definitely fixed the bugs and artifacts I still had terrible frame rates. I am talking about 6fps to a best of 30 fps when other players were visible. The weird part was that I was hitting 120fps when I faced away from everyone in the spawn room. I could not attribute this behaviour to anything I could understand.

Getting all technical:

A long time ago I worked on Xbox 360 and optimisations for the game we were working on. So I thought, well TF2 is no different. Off I went and downloaded the Intel GPA. It was rather simple the last I used about 4 years ago but now it was very different and a lot more complicated. So I started with the tutorials. Quite frankly there are some very nice tutorials that Intel provides. It took me about 1 day to pick all of it up and then I started working on TF2.

The first thing I did was setup a trigger for when the fps would drop below 25fps. That did not take too long. The question I wanted to answer was is it a CPU or a GPU bottleneck that I am observing. The GPU was taking close to 110ms to render a frame and that made no sense. The GPU can certainly take the flimsy shaders. So then I went into the game, turned all the features down, went to DX8 and started the game again. Certainly it wasn’t the same scene that I was profiling against but I did notice that the GPU was still at around 100 ms mark. This made no sense so I started looking at the stall times. Bing there were a crazy number of stall times and idle calls.

So then I looked at the CPU. I could not really get any good counters going here but the in game HUD from Valve did the trick. There were a significant number of draw calls but still nothing that really mattered. It appeared that the CPU was not pumping enough stuff out the pipeline. This was a mystery because I knew the CPU was quite beefy for the game.

Could this be another case of buggy Valve code?

It was a compiler error!

No it is not a compiler error but I’ve come across too many people attribute buggy code to compiler errors. I did encounter a compiler bug but that was when we were still working on early compiler releases for Xbox 360. No I never encountered compiler errors after that.

However blaming Valve for buggy code is similar to blaming compiler errors. When Valve pumps out buggy code or slow code enough number of people complain it that the forums will light up like an Alien on the Marine’s motion tracker. None of that happened. At this stage I was quite sure that there was something amiss on my system.

To confirm this I installed TF2 on my Desktop (Pentium D + Geforce 9800GT). That little bugger ran the game at the expected 120fps on lowest settings. This was when I decided to check what else changed on my laptop.

I suspected the CPU at this stage to be the primary bottleneck so I wanted to first benchmark it. I downloaded SiSoft’s benchmarking tool and ran it. Then I compared it to the numbers posted for my CPU Intel i5 2450 and bingo there was something really wrong with the CPU. It just wasn’t performing well.

Was it a Flu?

With the Flu season going around I suspect a Virus infection (wink wink @ Independence Day). There are only so many things that could cause it. I could safely rule out the Antivirus (Avast) because I had actually placed exceptions for TF2 and the child directories. I did a system wide scan and found no Viruses either. I could safely rule out a Flu for my system.

At this stage I installed Linux Mint anyway because I was working on Firefox OS development and working in the VM was getting rather painful. So I installed TF2 on Linux and I noticed very similar behaviour. I was not sure if it was the nVidia Optimus drivers messing around but I did knock this off by checking the GL strings. It was the nVidia GPU that was active.

Eureka!!

Now I must attribute this find to something completely unrelated to any of the profilers. While I was working in Linux I started a video (GTC videos) that ran at 1080p. After about 10 mins of running the system just shut off. This was weird and I wasn’t sure if it was my Son who knocked a cable out or something (I don’t run on Battery). I immediately checked the power settings to make sure that the system was not holding any CPU cycles back. No luck there either.

At this stage I was out of ideas but the profiler and benchmark numbers kept bothering me. Why was the CPU a bottleneck? So today when I started the GTC video again and the system shutdown I wondered if this was due to the CPU running too hot.

I installed gkrellm and noticed that the bare temperature was ~75C. Now this is ridiculous considering it is Winter and we live in the Midwest. My office room is 20C. While CPUs do run at a higher temperature this much of a variance on a cold start system made no sense to me. But, if only, the CPU was not being cooled enough.

Happy ending:

So opened up the laptop and sure enough there was enough dust there to cause a mutation and prove to everyone, once and for all, that Evolution is real. So I cleaned up the fan with a vacuum. Restarted the system and checked the temperature. Right now as I am typing this up I see 38C. When I started it, the temp read 35C.

My next test was to start the GTC video again. Well the temperature bumped to 45C and the GPU stood at 38C. Those are more like the numbers I am used to. Also I noticed that system was rather quiet.

For the ultimate test, I started TF2 (in Linux) and well guess what I can get back to being a Sniper. A steady 60fps. So children everywhere learn from my mistakes. Do give your computer a good clean at least once an Year. That and stay in School, till someone has a better advice.