September 24, 2004

No soup for... who?

Alright, so apparently some big wigs at Unicom here in town just called my house to ask why droves of their customers were getting 403 / Forbidden errors trying to access local news website bend.com.

Yeah, that's right, my house. Where do these people get my number?

I let them know that I had some Unicom IP addresses blocked out because there was a robot skipping around that IP address range requesting several pages per second during a period of time last week when the servers were being unnaturally slow.. thus more vulnerable to DoS. I'd blocked quite a few web spiders including Ask Jeeves. The unicom guy said that there were more than spiders on that IP address range — that was their entire network.

This week we made some performance improvements so I removed the block on Unicom. We should be able to survive a rogue spider or three now. The unicom guy thanked me, and gave me some 24/7 contact information so that we could let them know immediately if any more problems originate from their network so that they can send out vinnie and guido to handle things.

Now I am no proponent of "collateral damage" tactics. I was honestly bot-hunting the logs and it fully looked like that was the IP range of a bot which was behaving pretty badly. And I know I might sound a lot like Slashdot who I have recently boycotted for their reaction to people pulling their RSS feed... but this isn't an RSS feed I was fussing over, it is the spidering of full web pages which serve ads from remote servers on every view. Our RSS feed could be pulled a thousand times as often as it is without bothering us.

Posted by jesse at September 24, 2004 11:56 AM
Comments

What was the user agent string of this spider?

Posted by: Jon at September 24, 2004 01:21 PM

I'd have to sift through the logs again to be absolutely sure, but like many shifty spiders it just had a standard-looking IE useragent string. However what threw up a red flag for me is that it was requesting a large number of news articles randomly, with arbitrary article ids (some several years old). It was also repeat-requesting many of the urls, and reported no referrer for any of it's requests.

I'm having a difficult time finding the incident again in my logs, but I know that at least one of the offending IP addresses came up on arin whois as that block belonging to unicom, and that address itself being registered to [blank]. (I know who I think I remember it was and don't want to mention any more company names unless I can get the logs back in front of me again). So I nixed that /18 block and carried on.

Posted by: Jesse Thompson at September 24, 2004 02:17 PM

Gotcha, I'm not worried about naming names :) I was just curious if it was reporting itself as a spider or one of those masking itself.

Posted by: Jon at September 24, 2004 02:32 PM