Tag: twitter

Making a Twitter bot in Python (tutorial)

Updated Dec. 2015 to reflect changes on the Twitter Apps page. See bottom of post for even more Twitter bot scripts!

If there’s one thing this budding computational linguist finds delightful, it’s computers that talk to us. From SmarterChild to horse_ebooks to Beetlejuice, I love the weirdness of machines that seem to have a voice, especially when it’s a Twitter bot that adds its murmur to a tweetstream of accounts mostly run by other humans.

CDarwin twitter bot
@cdarwin bot tweets lines from Darwin’s ship log “according to the current date and time so that the Tweets shadow the real world. When it’s the 5th of August here, it’s the 5th August on board ship, albeit 176 years in the past.”

As fun midnight project a few weeks ago, I cobbled together @MechanicalPoe, a Twitter bot that tweets Poe works line by line on the hour from a long .txt file. This slow-tweeting of text is by no means new—@SlowDante is pretty popular, and so is @CDarwin, among many others. In case you want to make your own, here are the quick ‘n’ easy steps I took. This is just one way of doing it—shop around and see what others have done, too.

Step 1. Choose your text & chunk it. (Look, I hate the word chunk as much as the next person, but it’s like, what else are we going to say, nuggetize?) In any case, I chose some texts from Project Gutenberg and copied them into separate .txt files. (Maybe don’t choose a long-winded writer.) I ran a script over them to split them up by sentence and mark sentences longer than 140 characters. (Link to chunking script.) There are other scripts to break up long sentences intelligently, but I wanted to exert some editorial control over where the splits occurred in the texts, so the script I wrote writes ‘SPLIT’ next to long sentences to alert me as I went over the ~600 lines by hand. I copied my chunked texts into one .txt file and marked the beginnings and ends of each individual text. (Link to the finalized .txt file.)

Mechanical Poe literary twitter bot
Baby’s first Twitter bot. Tweets Poe hourly, except when it doesn’t.

Step 2. Set up your Twitter developer credentials. Set up your bot’s account, then get into the Applications manager and create a new app. Click the Keys and Access Tokens tab. You’ll see it already gave you a Consumer Key and Consumer Secret right off the bat. Scroll down to create a new Access Token.

Step 3. Configure script. You’ll have to install Tweepy, which takes advantage of the Twitter API using Python. Now take a look at this super-simple 27-line script I wrote based on a few other scripts elsewhere. This script is also on my Github:


#!/usr/bin/env python
# -*- coding: utf-8 -*-

# by robincamille - for mechnicalpoe

# Tweets a .txt file line by line, waiting an hour between each tweet.
# Must be running all the time, e.g. on a Raspberry Pi, but would be better
# if rewritten to run as a cron task.

import tweepy, time

#Twitter credentials
CONSUMER_KEY = 'xxxxxxxxxxxxxxx'
CONSUMER_SECRET = 'xxxxxxxxxxxxxxx'
ACCESS_KEY = 'xxxxxxxxxxxxxxx'
ACCESS_SECRET = 'xxxxxxxxxxxxxxx'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

#File the bot will tweet from
filename=open('lines.txt','r')
f=filename.readlines()
filename.close()

#Tweet a line every hour
for line in f:
     api.update_status(line)
     print line
     time.sleep(3600) #Sleep for 1 hour</code>

You’ll see that it takes a line from my .txt file, tweets it, and then waits for 3600 seconds (one hour). Fill in your developer credentials, make any changes to the filename and anything else your heart desires.

Step 4. Run script! You’ll notice that this script must always be running—that is, an IDLE window must always be open running it, or a command line window (to run in Terminal, simply write python twitterbot.py, or whatever your filename is). A smarter way would be to run a cron task every hour, and you should probably do that instead, but that requires rewriting the last part of the script. For me, MechanicalPoe runs on my Raspberry Pi, and it’s pretty much the only thing that’s doing now, so it’s fine for it to be running that script 24/7.

This is how Edgar Allan Poe lives on... Note the lovely 3D-printed case made for me by pal Jeff Ginger
This is how Edgar Allan Poe lives on… Note the lovely 3D-printed case made for me by pal Jeff Ginger

Gotchas. So you might encounter some silly text formatting stuff, like encoding errors for quotation marks (but probably not, since the script declares itself UTF-8). You might also make a boo-boo like I did and miss a SPLIT (below) or try to tweet an empty line (you’ll get an error message, “Missing stats”). Also, if you choose a poet like Poe whose lines repeat themselves, Twitter will give you a “Status is a duplicate” error message. I don’t know how long you have to wait to post, but that’s why there are gaps in Mechanical Poe’s Twitter record. The script I wrote is too simple to handle this error elegantly. It just crashes, and when you restart it, you’ll have to specify for line in f[125:]: (whatever line it is in your text file, minus 1) to start there instead.

Twitter bot mistake

Further reading:

Update Dec. 2015: My colleague Mark Eaton and I led a one-day Build Your Own Twitter Bot workshop. We built five ready-made Twitter bots. See the tutorial and get the Python scripts on my GitHub. I updated the above tutorial to reflect a different Apps panel in Twitter, too.

Python + BeautifulSoup + Twitter + Raspberry Pi

In my ongoing experiments with my Raspberry Pi, I’ve been looking for small ways it can be useful for the library. I’ve been controlling my Pi remotely using SSH in Terminal (tutorial — though you’ll have to note your Pi’s IP address first). As I noted yesterday, I’ve been making it tweet, but was looking to have it share information more interesting than a temperature or light reading. So now I have the Pi tweeting our library’s hours on my test account:

Tweeting library hours

To do this, I installed BeautifulSoup, a Python library for working with HTML. My Python script uses BeautifulSoup to search the library’s homepage and find two spans with the classes date-display-start and date-display-end. (This is looking specifically at a view in Drupal that displays our daily hours.) Then it grabs the content of those spans and plunks it into a string to tweet. Here’s the script:

#!/usr/bin/env python
import tweepy
from bs4 import BeautifulSoup
import urllib3

CONSUMER_KEY = '********************' #You'll have to make an application for your Twitter account
CONSUMER_SECRET = '********************' #Configure your app to have read-write access and sign in capability
ACCESS_KEY = '********************'
ACCESS_SECRET = '********************'

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

http = urllib3.PoolManager()

web_page = http.request('GET','http://www.lib.jjay.cuny.edu/')
web_page_data = web_page.data

soup = BeautifulSoup(web_page_data)
openh = soup.find('span','date-display-start') #spans as defined in Drupal view
closedh = soup.find('span','date-display-end')
other = soup.find('span','date-display-single')

if openh: #if library is open today, tweet and print hours
openh = openh.get_text() + ' to '
closedh = closedh.get_text()
api.update_status("Today's Library hours: " + openh + closedh + '.')
print "Today's Library hours: " + openh + closedh + '.'
elif other: #if other message (eg Closed), tweet and print
other = other.get_text()
api.update_status("Today's Library hours: " + other + '.')
print "Today's Library hours: " + other + '.'
else:
print "I don't know what to do."

Python libraries used:

I’ve configured cron to post at 8am every morning:

sudo crontab -e
[I added this line:]
00 8 * * * python /home/pi/Projects/Twitter/libhours-johnjaylibrary.py

Notes: I looked at setting up an RSS feed based on the Drupal view, since web scraping is clunky, but no dice. Also, there’s no real reason why automated tweeting has to be done on the Pi rather than a regular ol’ computer, other than I’d rather not have my iMac on all the time. And it’s fun.