Baby you can drive my car.

Algorithms Gone Wild!

This is the kind of story that would make the basis of a great movie. A story how one algorithm corrupts another to create nonsense and then threatens mankind.

One of our clients who manufactures particularly durable security yard signs and decals noticed on Google maps some strange words were showing up under their name: “Yard signs” (this was OK, that is what they make) and “Psychology public”. This made no sense. So we started investigating.

Algorithm Number 1 - dumb

Figuring that it had something to do with the automatically generated Google Plus profile for the company, we noticed that one section “Reviews from around the web” has two strange domains listed: pagespan(dot)com and rankinsider(dot)com. We couldnot figure out what these sites had to do with the company so we visited the sites.

Algorithm Number 2 - dumber

It was at these sites that we found the “psychology” reference. In a section of the site “profile” called “Keywords hit in search results”.  Both domains turned out to be exactly the same information.  (Also published as pageinsider dot com, rankdirection dot com and siteglimpse dot com. Why do we even worry about duplicate content penalties when you see schlock like this getting referenced?)

So Google’s algorithm sees “psychology public” on two of these sites and and just goes with it. How on earth can these sites be considered authoritative in any way?

The old joke “To err is human, but it takes a computer to really screw things up” has never been more true.

Whoever runs these sites must use web crawlers (like a search engine does) to gather information about websites. Their website says they “developed sophisticated algorithms and methods to effectively calculate the rank and information about any website”. But it is here where my client was confused with a book publisher that published books on psychology. So much for their “advanced algorithm”.

When One Algorithm Trusts Another

When Google Plus makes automatic profiles their algorithm goes out and tries to gather as much information as it can find. Unfortunately it cannot judge well if the information it finds is accurate. I would have thought that Google would not even pay attention to these kind of “aggregation sites” that produce no real value. Any human would look at the paged created by the “aggregation sites” as useless.

So what do we do?

This is different than trying to get a bad review removed. Google would not do that anyway — they figure that is your problem. They may be right. But this is clearly just a screw up by the cheesy algorithm at pagespan dot com (and whatever else it publishes under). My guess is that if we had a lot more offsite references to the company, this many not have happened. Who knows.

At the PageSpan site they mainly cloak themselves in public domain laws:

“All our content is for informational use only. We can use information found publicly on websites due to the Fair-Use clause of the Copyright Act of 1976, 17 U.S.C. § 107.”

They never mention anything about being confused, mixed up, or just plain stupid.

I get the feeling that the aggregator people don’t care at all about our problem. (Frankly, I don't even know why they exist.)  I know Google does not care about specific problems like this. They're busy. Caring does not scale well, Google is all about scaling up. (Does anyone have Google’s customer service number?) This will probably take years waiting for this to sort itself out — which is about the only recourse you have when so much of the Web is on autopilot.

So what how does this make me feel about Google’s driveless car? Crashes will still happen, but it would probably not be any worse than the way it is with regular drivers.

Beep beep ’m beep beep...yeah!