Is this Blog Spamming at a Whole New Level?
Posted on Dec 06, 2006
Ever since I simplified my CAPTCHA I have been getting some comment spam, though not extensive, but frequent enough to cause me even to make some code modifications to try to end it. Nonetheless, some recent comment spam has taken this to a whole new level. I am going to remove them from the post, but I will actually include some excerpts here to show how completely insidous this comment spam is...so much so, it took me looking very closely to distinguish it from a wholly legitimate comment. This being my first experience with this I thought I would document for other bloggers to look out for.Let's look at an example. The first comment came in a couple days ago and was on an older post of mine (the first in my objects and frameworks series). It read like so:
I had such a conversation with Sean Tuesday. In short, my question is "why do we go to all this trouble when the life of a page is only 54 ms?" And yes, Sean was blunt and really didn't answer the question. I'm already a big fan of OO, but when it comes to web applications, I don't really get the point of OO. So in a way, I am looking for "Rosetta Stone" that would help me wrap my mind around web OO.Your post makes perfect sense and now that I feel I'm not alone in the way I think, it makes it esasier to excetpt what I don't yet understand.
Thanks for a giving me the guiding light that will let me take my mind off the 'stone' and get back to web OO. I look forward to collection a few coins
First, by its sheer length, one wouldn't pick it as a spam. Second, it actually seems to both make sense and remains on topic. I didn't bother responding, but I really didn't think much of it at the time until I got someone responding a couple days later.
Firstly, how big are your applications? If you're writing a single page form processor I think it's safe to say that Mach-II with Coldspring and Reactor might just be a bit of overkill.Start with something simple. Do you have the same query in more than one place? Do you ever change that query? If so, why not learn how to stick it into a cfc and call that so you only have to change it in one place (if you don't have more than one copy of a query or you never ever change your queries, even this isn't necessary).
Each step into OO land is designed to solve a specific problem. While it is nice to try to "get" OO, it is sometimes better to write and then refactor (get it working, then get it better).
And the fact is that web applications don't last 54 ms - they last years and years of having to maintain them. That is why writing maintainable code is a good idea. OO (when done right) just makes it easier to maintain your code (you're certainly not going to add all the overhead for performance purposes!)
Well, again not obvious spam either. It is both on topic and appears to be posted in response to the earlier comment. However, one thing caught my eye - I noticed that the first comment was from a "Handbag P" and linked to a suspicious looking subdomain and the second from a Loan C and from a subdomain of the same domain as the first. Upon closer inspection, the two emails, while not the same, were suspiciously similar. (just to note, I had not paid close attention to the names or domains previously)
After showing this to a friend, he caught something else. If you look at the prior comments from the post, all they did was (with minor editing), copy prior comments. However, the post was old enough that I hadn't recalled those comments offhand and might not have noticed.
So I am left with one little conundrum. You see, when I had the complicated CAPTCHA, I received no, as in zero, comment spam. I simplified the CAPTCHA based on a few user requests, but now I am getting as few as one, but as many as a handful a day. My guess is that they have to be doing this manually, and one can only wonder how it could be worth the effort. I already had to get rid of the subscribe option on the blog because it was simply a haven for spammers. This was easy when it didn't take a lot of effort to distinguish spam from non-spam, but at this point I am tempted to put the complicated CAPTCHA back in place...thoughts?
Comments
Another alternative is comment moderation. It will be in BlogCFC this week.
Posted By Raymond Camden / Posted on 12/06/2006 at 9:22 PM
Was the email a valid email address? I wonder if anyone out there might have some sort of CFC that could validate an email address against the server. At the least then you could ping the server to see if the email is valid, if it isn't, get rid of the comment. Not the greatest solution, but could help if the email address is bogus.
BTW this is a post from Loan D
:)
Posted By Tony Petruzzi / Posted on 12/06/2006 at 10:18 PM
Puts me in mind of the "poetry" spam comment I got a while back:
http://corfield.org/entry/Sounds_how_the_factory__inadvertent_poetry
Posted By Sean Corfield / Posted on 12/07/2006 at 1:03 AM
I've found that simple session detection works wonders for bot based comment spam while a combination of regex and baysian works for human based spam.
Posted By Michael Dinowitz / Posted on 12/07/2006 at 10:49 AM
when layla captcha first came out Mr. Farrell said that some people will actually download the captcha image and run it through an OCR program to decode it and then they could spam you... like this time, I only had to put in "uq". without the background jumble, ocr can probably decode it.
Posted By Michael White / Posted on 12/07/2006 at 1:23 PM
fyi ... Wired Magazine had a very thorough feature about this in their September issue:
Spam + Blogs = Trouble
Splogs are the latest thing in online scams and they could smother the Internet.
http://www.wired.com/wired/archive/14.09/splogs.html
Per the article, Akismet currently may be the best defense. Details at http://akismet.com/faq/
Seems the "evil doers" are far from defeated yet. On the email front the New York Times yesterday reported about how it is getting worse there too:
Spam Doubles, Finding New Ways to Deliver Itself
http://www.nytimes.com/2006/12/06/technology/06spam.html
hth,
g
Posted By greg h / Posted on 12/07/2006 at 1:28 PM
Brian,
Funny, as soon as I started reading your excerpts, I thought, "Wait, I've read this comment!" And then the next thought I had was that a spammer could easily automate the process of grabbing a random comment and appending some garbage to it - kind of what they're doing to beat the Bayesian filters on email.
It's going to be a see-saw, as with spam mail, but for the time being, it would be great if one of you Flash types could enhance the Lyla Captcha by rendering a Flash image instead of a gif. (I was thinking first of an AJAX call to render the image the hidden form value, but I guess some 'Bots are now JavaScript aware?!) A nice little pulsing trio of letters in Flash would be hard for 'Bots to read but pretty easy for humans.
As for the manual spammers, sheesh - tough way to make a living, but I guess from the spammer's point of view, if you can get your kid brother to sit around all day with a list of blogs....? I can't believe it is going to be sustainable to do that for spammers. But I've been stunned by human stupidity before.
/ejt
Posted By Edward T / Posted on 12/09/2006 at 10:05 AM
Well, the spam post have increased considerably, even in my knowledgebase, i am usually spammed by adult content which prictically makes no sense.
Posted By cheap web hosting / Posted on 12/11/2006 at 2:49 AM
Well, in the end I used some additional recommendations Charlie Arehart had to up my CAPTCHA ever so slightly. Will see if it works.
By the way, "Cheap Web Hosting"...your comment is on topic and doesn't appear to be clearly spam, so I am leaving it...but using that tag line for your name makes your posts seem like, at the very least, borderline comment spam. I don't mind people linking to their site obviously (even if it is a business site), but the tag line seems to me to make it cross into a definite spam gray area.
Posted By Brian Rinaldi / Posted on 12/13/2006 at 11:58 AM
Have you considered switching to the "Click on the picture of [something]" and presenting the user with three pictures to choose from?
It seems so much easier for a human to process.
Also, perhaps it's time to flex-capthca... You could still display the same graphic, just call it through flex and have th e text input in flex too. Most spammers rely on images and HTML... I bet they're too lazy (for now) to try and read a swf.
Posted By Shannon Hicks / Posted on 12/14/2006 at 9:35 AM