Invisible spam pages on our website: how we locked out a hacker

TL;DR: A hacker uploaded a fake JPG file containing PHP code that generated “invisible” spam blog posts on our website. To avoid this happening to you, block inactive accounts in Drupal and monitor Google Search Console reports.

I noticed something odd on the library website the other day: a search of our site displayed a ton of spam in the Google Custom Search Engine (CSE) results.

google CSE spam

But when I clicked on the links for those supposed blog posts, I’d get a 404 Page Not Found error. It was like these spammy blog posts didn’t seem to exist except for in search results. I thought this was some kind of fake-URL generation visible just in the CSE (similar to fake referral URLs in Analytics), but regular Google was seeing these spammy blog posts as being on our site as well if I searched for an exact title.

spam results on google after searching for exact spam title

Still, Google was “seeing” these blog posts that kept netting 404 errors. I looked at the cached page, however, and saw that Google had indexed what looked like an actual page on our site, complete with the menu options.

cached page displaying spam text next to actual site text

Cloaked URLs

Not knowing much more, I had to assume that there were two versions of these spam blog posts: the ones humans saw when they clicked on a link, and the ones that Google saw when its bots indexed the page. After some light research, I found that this is called “cloaking.” Google does not like this, and I eventually received an email from Webmaster Tools with the subject “Hacked content detected.”

It was at this point that we alerted the IT department at our college to let them know there was a problem and that we were working on it (we run our own servers).

Finding the point of entry

Now I had to figure out if there was actually content being injected into our site. Nothing about the website looked different, and Drupal did not list any new pages, but someone was posting invisible content, purely to show up in Google’s search results and build some kind of network of spam content. Another suspicious thing: these URLs contained /blogs/, but our actual blog posts have URLs with /blog/, suggesting spoofed content. In Drupal, I looked at all the reports and logs I could find. Under the People menu, I noticed that 1 week ago, someone had signed into the site with a username for a former consultant who hadn’t worked on the site in two years.

Inactive account had signed in 1 week, 4 days ago

Yikes. So it looks like someone had hacked into an old, inactive admin account. I emailed our consultant and asked if they’d happened to sign in, and they replied Nope, and added that they didn’t even like Nikes. Hmm.

So I blocked that account, as well as accounts that hadn’t been used within the past year. I also reset everyone’s passwords and recommended they follow my tips for building a memorable and hard-to-hack password.

Clues from Google Search Console

The spammy content was still online. Just as I was investigating the problem, I got this mysterious message in my inbox from Google Search Console (SC). Background: In SC, site owners can set preferences for how their site appears in Google search results and track things like how many other websites like to their website. There’s no ability to change the content; it’s mostly a monitoring site.

reconsideration request from google

I didn’t write that reconsideration request. Neither did our webmaster, Mandy, or anybody who would have access to the Search Console. Lo and behold, the hacker had claimed site ownership in the Search Console:

madlife520 is listed as a site owner in google search console

Now our hacker had a name: Madlife520. (Cool username, bro!) And they’d signed up for SC, probably because they wanted stats for how well their spam posts were doing and to reassure Google that the content was legit.

But Search Console wouldn’t let me un-verify Madlife520 as a site owner. To be a verified site owner, you can upload a special HTML file they provide to your website, with the idea that only a true site owner would be able to do that.

google alert: cannot un-verify as HTML file is still there. FTP client window: HTML file is NOT there.

But here’s where I felt truly crazy. Google said Madlife520’s verification file was still online. But we couldn’t find it! The only verification file was mine (ending in c12.html, not fd1.html). Another invisible file. What was going on? Why couldn’t we see what Google could see?

Finding malicious code

Geng, our whipsmart systems manager, did a full-text search of the files on our server and found the text string google4a4…fd1.html in the contents of a JPG file in …/private/default_images/. Yep, not the actual HTML file itself, but a line in a JPG file. Files in /private/ are usually images uploaded to our slideshow or syllabi that professors send through our schedule-a-class webform — files submitted through Drupal, not uploaded directly to the server.

So it looks like this: Madlife520 had logged into Drupal with an inactive account and uploaded a text file with a .JPG extension to a module or form (not sure where yet). This text file contained PHP code that dictated that if Google or other search engines asked for the URL of these spam blog posts, the site would serve up spammy content from another website; if a person clicked on that URL, it would display a 404 Page Not Found page. Moreover, this PHP code spoofed the Google Search Console verification file, making Google think it was there when it actually wasn’t. All of this was done very subtly — aside from weird search results, nothing on the site looked or felt differently, probably in the hope that we wouldn’t notice anything unusual so the spam could stay up for as long as possible.

Steps taken to lock out the hacker

Geng saved a local file of the PHP code, then deleted it from the server. He also made the subdirectory they were in read-only. Mandy, our webmaster, installed the Honeypot module in Drupal, which adds an invisible “URL: ___” field to all webforms that bots will keep trying to fill without ever successfully logging in or submitting a form, in case that might prevent password-cracking software. On my end, I blocked all inactive Drupal accounts, reset all passwords, unverified Madlife520 from Search Console, and blocked IPs that had attempted to access our site a suspiciously high number of times (these IPs were all in a block located in the Netherlands, oddly).

At this point, Google is still suspicious of our site:

"This site may be hacked" warning beneath Lloyd Sealy Library search result

But I submitted a Reconsideration Request through Search Console — this time, actually written by me.

And it seems that the spammy content is no longer accessible, and we’re seeing far fewer link clicks on our website than before these actions.

marked increase, then decrease in clicked links to our site

I’m happy that we were able to curb the spam and (we hope) lock out the hacker in just over a week, all during winter break when our legitimate traffic is low. We’re continuing to monitor all the pulse points of our site, since we don’t know for sure there isn’t other malicious code somewhere.

I posted this in case someone, somewhere, is in their office on a Friday at 5pm, frantically googling invisible posts drupal spam urls 404??? like I was. If you are, good luck!

Heads Up! in PowerPoint for library class sessions

Since my John Jay colleague Kathleen Collins wrote about using active learning strategies in library “one-shot” sessions, I’ve been experimenting with games and hands-on activities to keep students engaged in the material. Typically, I cover library research basics in the sessions I teach: breaking a research question down into keywords (this is hard for freshmen!) and finding books/articles.

I frequently refer to “Don’t Do Their Work: Active Learning and Database Instruction,” a fantastic article in LOEX by Jennifer Sterling, which covers different active-learning activities she uses in her classroom. One in particular has been a breakout success for my own teaching.

Heads Up! is an iOS/Android app from Ellen DeGeneres (et al.) based on the old game Password, wherein the player who’s “it” must guess a word they can’t see based on hints from their teammates. It’s a great way to get students thinking about synonyms and related words for keywords, and it absolutely starts the class session off with a high energy level.

Because this is happening in the library classroom, I have adapted Heads Up! for a PowerPoint presentation. It’s a little hokey — it’s just a list of words that appear on-click next to a one-minute timer gif. Two volunteers from each side of the room stand in front of the projector screen so they can’t see the words, but their teammates can.

heads up in the library

Download my PowerPoint slides for adapted Heads Up! (adapt further and reuse freely) »
(This version includes different timer gifs on each page. I did this because sometimes Powerpoint glitches out when “restarting” the same gif on a different page.)

I ask for 2 volunteers from each side of the room. Both volunteers can guess when it’s their team’s turn (so that they don’t feel so alone at the front of the room, especially when they’re not doing well). Both teams get 2 rounds, meaning the game lasts around 4 minutes total (plus some banter in between). Usually, students get between 2 and 7 words. Note that these are general words, not library-y words. Something easy and low-barrier to engage students from the get-go. So far, my favorite moment has been for the keyword “Chiptole,” for which half the classroom devolved into students shouting “Bowl! Bowl! BOWL! BOWL!” at their flustered classmate. (“Cereal? Spoon? Plate? Salad?? Soup??”) Probably the most laughter that’s ever occurred on my watch.

I swear by this activity! Students absolutely get the connection between Heads Up! and the next part of my presentation, in which they pick keywords out of their actual research questions and find synonyms and related words, then trade worksheets with a classmate. (Warn them about trading ahead of time, if you’re going to ask them to do this.) This keyword-gathering activity, too, is inspired by that LOEX article.

Screen Shot 2015-11-17 at 3.01.56 PM

Download “Keywords” Word Document (adapt and reuse freely) »

Let me know if you use these or other active-learning approaches in your library classes. I’m always looking for fun ways to engage undergrads in the library curriculum.

Update (added August 12, 2016): I updated the slides. Also, there aren’t many 1-minute countdown gifs out there, so I made some in black and white, below. They’re set to run through the animation only once, so don’t worry if they’re all “00,” just download the gif.

one minute countdown timer gifone minute countdown timer gif

one minute countdown timer gifone minute timer countdown gif

What did I do this year? 2014–15 edition

librarian word cloud

I jump-start my annual self-evaluation process with a low-level text analysis of my work log, essentially composed of “done” and “to do” bullet points. I normalized the text (e.g. emailed to email), removed personal names, and ran the all “done” items through Wordle.

2014–15 was my third year in my job and the third time I did this. (See 2012–13 and 2013–14). I do this because it can be difficult to remember what I was up to many months ago. It’s also a basic visualization of where my time is spent.

What did I do at my job this year?

Aside from the usual meetings, emails, and Reference Desk duties…

  • chat: I implemented a chat reference service with my colleagues (this had been tried before on this campus, but with subpar software and bad staffing experiences; this time, we have limited hours and are very happy with LibraryH3lp)
  • 50th: I worked on a physical and digital exhibit on the 50th anniversary of John Jay
  • mmc: We rolled out the Murder Mystery Challenge for the second year
  • l-etc: I co-chaired the LACUNY Emerging Tech Committee for the second year
  • dc: I worked more on our Digital Collections site, importing materials and refining the UX
  • mla: I went to MLA 2015 in Vancouver and gave a presentation
  • onesearch: We further implemented CUNY’s web-scale discovery service; I organized and ran a usability testing session with my colleagues
  • caug: I began to convene the CollectiveAccess User Group at METRO
  • socialmedia: I became more active on behalf of the library on the @johnjaylibrary Instagram account
  • newsletter: I designed two more biannual issues of Classified Information, our department newsletter
  • drupal, page, fixed, update, added, etc.: I continued to maintain the library’s Drupal-based website

What’s on tap for 2015–16? Lots of online education outreach and much more instruction than I’ve previously done! I’m also starting to flex my writing muscles, starting with a quarterly column in Behavioral & Social Sciences Librarian.

CollectiveAccess work environment

I wrote earlier about our CollectiveAccess workflow for uploading objects one-by-one and in a batch. Now I’ll share our CollectiveAccess work environment. We use two Ubuntu servers, development (test) and production (live), both with CollectiveAccess installed on them. We also use a private GitHub repository.

This is only one example of a CollectiveAccess workflow! See the user-created documentation for more.

Any changes to code (usually tweaking the layout of front end, Pawtucket) are made first on the dev instance. Once we’re happy with the changes and have tested out the site in different browsers, we commit & push the code to our private GitHub repository using Git commands on the command line. Then we pull it down to our production server, where the changes are now publicly viewable.

Any changes to objects (uploading or updating objects, collections, etc.) are made directly in the production instance. We never touch the database directly, only through the admin dashboard (Providence). These data changes aren’t done in the dev instance; we only have ~300 objects in the dev server, as more would take up too much room, and there’s no real reason why we should have all our objects on the dev instance. But if there’s a new filetype we’re uploading for the first time, or another reason an object might be funky, we add the object as a test object to the dev server.

Any changes to metadata display (adding a new field in the records) is done through the admin dashboard. I might first try the change on the dev instance, but not necessarily.

Pros of this configuration:

  • code changes aren’t live immediately and there is a structure for testing
  • all code changes can be reverted if they break the site
  • code change documentation is built into the workflow (Git)
  • objects and metadata are immediately visible to the public
  • faculty/staff working on the collections only don’t need to know anything about Git


  • increasing mismatch between the dev and production instances’ objects and metadata display (in the future, we might do a batch import/upload if we need to)
  • this workflow has no contact with the CollectiveAccess GitHub, so updates aren’t simply pulled, but rather manually downloaded

Not pictured or mentioned above: our servers are backed up on a regular basis, on- and off-site; and anytime there’s a big code update, a snapshot is taken of the database.

CollectiveAccess super user? Add your workflow to the Sample Workflows page! 

The faculty toolbox for online learning

When I code, I love simply copying and pasting from an example website or someone’s open source code. Most of my projects begin as a collage of different code samples that are gradually tuned to my goal. That copy/paste ethos informed my latest work in progress, the Faculty Toolbox.

What’s inside?

The Faculty Toolbox is a goody bag for John Jay faculty who teach online. Inside, there are special library modules they can drag & drop into a course shell; simple instructions for embedding streaming videos; a proxied link generator; and basic info about library liaisons and how I, the Emerging Technologies & Distance Services Librarian, can support online teaching.

It’s a little thing, but it’s a big thing. The Toolbox has been a conversation piece in multiple meetings I’ve been in, and whenever I unveil it, there’s definitely an ‘ooh!’ response to seeing a collection of useful resources prepackaged and offered on a single page. It’s not just a toolbox; it’s a gateway.

Goody bag + cave of wonders

The terminology I use is important. “Toolbox,” “goody bag,” “starter kit” — these are all phrases that call to mind a plethora of shiny items without being overbearing. There’s no “template” or even “guide” happening here; this is a partnership between the library and faculty, rather than a service or directive. And phrases like “generator” or “drag and drop” are derived from exciting action verbs that imply quickness and ease.

That intentional terminology is a response to one barrier to using library resources in online classes. It’s not that it’s difficult, per se, but it’s a bummer to have to scurry all over the library website(s) to gather teaching materials for students. By all means, that’s part of creating course curricula — but the simpler things, like linking to APA/MLA citation guides, should be easy as pie, and we make it so.

Lastly, the Toolbox can be a Cave of Wonders, too. So many faculty haven’t realized the richness of our streaming video collections. When I show it to them (or when they glance at the sample videos I linked to), a whole new world of engaging course content opens up.

Placement & promotion

The Faculty Toolbox is linked from our Faculty Resources list, where they can also find important information about citation metrics and purchase requests. It’s also linked from the John Jay Online faculty resources page, and it’s been emailed to all JJO instructors, too. And in the fall, I’ll be showing it off right and left at a number of workshops in different contexts — Faculty Development Day, Blackboard training workshops, and more.

Blackboard modules from the Library

Our Toolbox was inspired by the one at FIT, which was created by Helen Lane. She mentioned this at an ACRL/NY Distance Ed SIG last year, and it’s an excellent example. Take a look — she makes it so easy to embed many things.

What else would be appropriate to include in the Toolbox?