git – Emerging Tech in Libraries

I wrote earlier about our CollectiveAccess workflow for uploading objects one-by-one and in a batch. Now I’ll share our CollectiveAccess work environment. We use two Ubuntu servers, development (test) and production (live), both with CollectiveAccess installed on them. We also use a private GitHub repository.

This is only one example of a CollectiveAccess workflow! See the user-created documentation for more.

Any changes to code (usually tweaking the layout of front end, Pawtucket) are made first on the dev instance. Once we’re happy with the changes and have tested out the site in different browsers, we commit & push the code to our private GitHub repository using Git commands on the command line. Then we pull it down to our production server, where the changes are now publicly viewable.

Any changes to objects (uploading or updating objects, collections, etc.) are made directly in the production instance. We never touch the database directly, only through the admin dashboard (Providence). These data changes aren’t done in the dev instance; we only have ~300 objects in the dev server, as more would take up too much room, and there’s no real reason why we should have all our objects on the dev instance. But if there’s a new filetype we’re uploading for the first time, or another reason an object might be funky, we add the object as a test object to the dev server.

Any changes to metadata display (adding a new field in the records) is done through the admin dashboard. I might first try the change on the dev instance, but not necessarily.

Pros of this configuration:

code changes aren’t live immediately and there is a structure for testing
all code changes can be reverted if they break the site
code change documentation is built into the workflow (Git)
objects and metadata are immediately visible to the public
faculty/staff working on the collections only don’t need to know anything about Git

Cons:

increasing mismatch between the dev and production instances’ objects and metadata display (in the future, we might do a batch import/upload if we need to)
this workflow has no contact with the CollectiveAccess GitHub, so updates aren’t simply pulled, but rather manually downloaded

Not pictured or mentioned above: our servers are backed up on a regular basis, on- and off-site; and anytime there’s a big code update, a snapshot is taken of the database.

CollectiveAccess super user? Add your workflow to the Sample Workflows page!

The Lloyd Sealy Library website uses Drupal 7 as its content management system and Git for version control. The tricky thing about this setup is that you can keep track of some parts of a Drupal site using Git, but not all. Code can be tracked in Git, but content can’t be.

Code

theme files (CSS, PHP, INC, etc.)
the out-of-the-box system
all modules
any core or module updates (do on dev, push to production)

Content

anything in the Drupal database:
- written content (pages, blog posts, menus, etc.)
- configurations (preferences, blocks, regions, etc.)

Here’s our workflow:

Code: Using Git to push code from dev to production is pretty straightforward. I was a SVN gal, so getting used to the extra steps in Git took some time to learn. I used video tutorials made by our consultants at Cherry Hill as well as Lynda.com videos. (For those new to using version control, it’s a mandatory practice if you manage institutional websites. Using version control between two servers lets you work on the same content simultaneously with other people and roll out changes in a deliberate manner. Version control keeps track of all the changes made over time, too, so if you mess up, you can easily revert your site back to a safe version.)

Content: Keeping the content up to date on both servers is a little hairier. We use the Backup and Migrate module to update our dev database on an irregular schedule with new content made on the production server. The only reason to update the dev database is so that our dev and production sites aren’t confusingly dissimilar. Additionally, some CSS might refer to classes newly specified in the database content. The schedule is irregular because the webmaster, Mandy, and I sometimes test out content on the dev side first (like a search box) before copying the content manually onto the production site.

Why have a two-way update scheme? Why not do everything on dev first, and restore the database from dev to production? We want most content changes to be publicly visible immediately. All of our librarians have editor access, which was one of the major appeals of using a CMS that allowed different roles. Every librarian can edit pages and write blog posts as they wish. It would be silly to embargo these content additions.

Help: A lot of workflow points are covered in Drupal’s help page, Building a Drupal site with Git. As with all Drupal help pages, though, parts of it are incomplete. The Drupal4Lib listserv is very active and helpful for both general and library-specific Drupal questions.

Non-Drupal files: Lastly, we have some online resources outside of Drupal that we don’t want clogging up our remote repository, like the hundreds of trial transcript PDFs. These aren’t going to be changing, and they’re not code. The trial transcript directory is therefore listed in our .gitignore file.

Any Drupal/Git workflow tips?

Emerging Tech in Libraries

Tag: git

CollectiveAccess work environment

Using Drupal and Git for a library website

Here’s our workflow:

Need help with the Commons?