CollectiveAccess workflow

I’ve gotten a few emails lately from other library/archive organizations asking about how we use CollectiveAccess, open-source software for digital collections at museums, archives, and libraries. Our Digital Collections at John Jay launched earlier this year and runs on CollectiveAccess. We’re really happy with it! Since it’s designed for archival-style content from the get-go, there are a lot of really nice library-friendly touches.

For those considering CollectiveAccess, it might be helpful to see what it looks like to use the software. CollectiveAccess takes a good amount of elbow grease to get up and going (more than Omeka, for instance), but the workflows are pretty straightforward once your installation is stable.

Uploading objects to CollectiveAccess

So how exactly do you populate your CollectiveAccess site? First, I’ll define a few special words that CollectiveAccess uses:

object: the thing you digitized. E.g., a photograph, a book, a document. Our rule of thumb is that one physical object = one digital object. Each object is of one type…

object type: what category is the thing? This will affect what metadata fields you’ll fill in. For instance, our object type “Trial transcript” has fields for “court” and “stenographer’s number,” which only apply to this object type.

media representation: an uploaded file. One object can have multiple representations. A photograph-type object might have two media representations: scans of the front and back. Or an oral history might have a PDF and several audio clips.

collection: the conceptual group that contains objects. A collection can have multiple objects. Again, our rule of thumb is one physical collection in the archives = one digital collection. Makes it easy! Makes total sense! (Okay, sometimes we fudge a little.) See our list of collections in our Digital Collections.

Note: the workflows below are just how we use the software. Other places may differ. But it’s useful to see examples. This also assumes that you’re logged into the back end and your metadata schema are good to go.

Screenshot of CollectiveAccess, editing a single object
Screenshot of CollectiveAccess, editing a single object (click for larger)

Our workflow for uploading objects one at a time

Example: we had student workers create the John Jay College Archives collection by scanning and inputting metadata, one thing at a time (reviewed later by librarians)

  1. Click “New object” in CollectiveAccess, choosing appropriate object type 
  2. Write in metadata, either basic or complete, following your organization’s conventions 
  3. Upload object (can’t be done first, as uploaded item must have identifier to latch on to, assigned in step 2) 
  4. Review, then make publicly accessible 
Template for data import in CollectiveAccess
Template for data import in CollectiveAccess. This works in conjunction with another spreadsheet that has metadata related to cases on it. (Click to see larger image, or email me for more example templates)

Our workflow for batch uploads, when we already have all metadata and media files

Example: migrating files and metadata out of an old database, which is what we’re currently doing for our trial transcripts collection

  1. Batch-upload metadata, using the filename as identifier 
    • data import for CA is complicated to understand at first, but once you get your spreadsheets and templates in order, it’s amazing and fast
    • this step creates a bunch of objects that don’t have media files attached to them (they’re just records) 
    • you might have to do multiple data imports,to split up big data or because you have complicated data (e.g., we have lots of overlapping person data: defendants, judges, etc.)
  2. Batch-upload files, matching on filename to existing objects. Takes a while
  3. Review, then make publicly accessible

When you upload a file to CollectiveAccess, it can take a while because it creates a lot of derivatives. For example, one uploaded photo generates all these files:

Screenshot of derative filenames from CollectiveAccess
Screenshot of derivative filenames from CollectiveAccess

It also stores the original file, though it’s up to you to decide which derivative you allow users to download, if any. Our users can view objects in high resolution (in a special image viewing frame) and download full PDFs, but can only download medium-size JPGs for images. For print quality-size images, a user must contact our Special Collections librarian. This ensures accurate citations.

NYC-area CollectiveAccess events

The CollectiveAccess software is made right here in the city! In September, the friendly CollectiveAccess developers led a workshop at METRO that walked us through configuring new CA installations and importing sample data. The workshop materials are still online and are incredibly useful in piecing together the data import process.

I’m the convener of the CollectiveAccess User Group here in NYC. Our next meeting is Monday, December 1, 2014 at 10am at METRO. We’ll get behind-the-scenes tours of CollectiveAccess installations at Brooklyn Navy Yard, Roundabout Theatre Company, Booklyn, and New York Society Library. The CA team attends User Group meetings, too, and is as helpful and responsive in person as they are in the support forums. If you’re interested in CollectiveAccess, register for free & join us at METRO!


  1. Hey Robin,

    Thanks a lot for this article. I’ve been having a great time playing with the demo, and also getting it to run with Pawtucket on my local machine. We’re going to migrate some data from MusArch, so the mapping comment about it being easy in the end, is really timely, and motivating. I can’t wait to get cataloguing!

  2. Does CA accommodate hierarchical keywords and categories, such as TGM1 used by Library of Congress?
    If one enters a keyword (e.g. wrestling) will a search of “sports” include wrestling?

    What standard terms do you use?

    • Do you mean does CollectiveAccess support controlled vocabularies? CollectiveAccess documentation:

      For the Digital Collections at John Jay, we use LCSH, but some of our original-cataloging subjects were not available through the LC API that CollectiveAccess uses. For now, we’re using that standard vocabulary, but in a free-text field.

      For your particular example, a site search for ‘sports’ would include ‘wrestling’ only if the wrestling photo were catalogued with ‘sports’ as a subject or in the description.

  3. Eugene De La Rosa says:

    Does CollectiveAccess automatically generate derivatives out of the box, or does this feature have to be configured during setup?

    • CA generates derivates out of the box! Using ImageMagick (or other software), CA creates about a dozen different sizes. In the image viewer, users can zoom into the biggest size. You get to pick what size images to display and allow users to download.

Leave a Reply

Your email address will not be published. Required fields are marked *

To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image