Tag: python

Python + BeautifulSoup + Twitter + Raspberry Pi

In my ongoing experiments with my Raspberry Pi, I’ve been looking for small ways it can be useful for the library. I’ve been controlling my Pi remotely using SSH in Terminal (tutorial — though you’ll have to note your Pi’s IP address first). As I noted yesterday, I’ve been making it tweet, but was looking to have it share information more interesting than a temperature or light reading. So now I have the Pi tweeting our library’s hours on my test account:

Tweeting library hours

To do this, I installed BeautifulSoup, a Python library for working with HTML. My Python script uses BeautifulSoup to search the library’s homepage and find two spans with the classes date-display-start and date-display-end. (This is looking specifically at a view in Drupal that displays our daily hours.) Then it grabs the content of those spans and plunks it into a string to tweet. Here’s the script:

#!/usr/bin/env python
import tweepy
from bs4 import BeautifulSoup
import urllib3

CONSUMER_KEY = '********************' #You'll have to make an application for your Twitter account
CONSUMER_SECRET = '********************' #Configure your app to have read-write access and sign in capability
ACCESS_KEY = '********************'
ACCESS_SECRET = '********************'

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

http = urllib3.PoolManager()

web_page = http.request('GET','http://www.lib.jjay.cuny.edu/')
web_page_data = web_page.data

soup = BeautifulSoup(web_page_data)
openh = soup.find('span','date-display-start') #spans as defined in Drupal view
closedh = soup.find('span','date-display-end')
other = soup.find('span','date-display-single')

if openh: #if library is open today, tweet and print hours
openh = openh.get_text() + ' to '
closedh = closedh.get_text()
api.update_status("Today's Library hours: " + openh + closedh + '.')
print "Today's Library hours: " + openh + closedh + '.'
elif other: #if other message (eg Closed), tweet and print
other = other.get_text()
api.update_status("Today's Library hours: " + other + '.')
print "Today's Library hours: " + other + '.'
else:
print "I don't know what to do."

Python libraries used:

I’ve configured cron to post at 8am every morning:

sudo crontab -e
[I added this line:]
00 8 * * * python /home/pi/Projects/Twitter/libhours-johnjaylibrary.py

Notes: I looked at setting up an RSS feed based on the Drupal view, since web scraping is clunky, but no dice. Also, there’s no real reason why automated tweeting has to be done on the Pi rather than a regular ol’ computer, other than I’d rather not have my iMac on all the time. And it’s fun.

Generating red link links for Wikipedia

Red link listsOn Global Women Wikipedia Write-In Day, I was extremely impressed with user Dsp13’s lists of red links — lists of notable women that hadn’t yet been written about on Wikipedia. I used that page as a springboard to write about some notable women in American history, like the wonderful Agnes Surriage Frankland. Dsp13 took these lists of names from resources like Famous American Women: A Biographical Dictionary, signifying the notability of the listed names and giving editors a place to start their research.

I wanted to do something similar for printing history, a research interest of mine. Here is a red link list I cobbled together. As it turns out, there are tons of other red link lists, too! I’m not sure how other people are generating them — probably from databases or other lists already in digital form. (Any info?) But many good resources are in book form, sometimes keyboarded but likely scanned and OCR’d. To make my red link lists, I’m taking indexes from scanned books and generating lists in wiki format for my user page.

Index, wikified and listified
Messy OCR’d book index » cleaned up and put in wiki format » list on Wikipedia

Generating wiki lists from indexes

  1. Find an interesting, useful book with an index that’s been keyboarded or OCR’d. These will likely be on Gutenberg or Internet Archive. (Example.)
  2. Copy/paste the index into a plain text editing program like TextWrangler.
  3. Strip out the unnecessary stuff in the index (like page numbers), manually remove redundant/unimportant lines (optional), and format the list for Wikipedia (switch reversed names, put in *[[title]] formatting, split into columns). I wrote a couple of messy Python scripts for this step.
  4. Copy/paste the resulting text into your user page. 

Now you can get a quick visual of how many of these entities still need to be written up!

Note that some entities might be notable, and some might not be. And of course, some blue-linked wiki pages might not describe the right entity or might lead to a disambiguation page. Regardless, it’s a place to start!