CollectiveAccess importing workflow

CollectiveAccess importing workflow

This step-by-step workflow illustrates how I import objects (metadata + files) into CollectiveAccess. I’m writing this post partly to give others an idea of how to import content into CollectiveAccess — but mainly it’s for my future self, who will likely have forgotten!

Caveats: Our CollectiveAccess instance is version 1.4, so some steps or options might not be the same for other versions. This is also just a record of what we at John Jay do when migrating/importing collections, so the steps might have to be different at your institution.

Refer to the official CollectiveAccess documentation for much more info: metadata importing and batch-uploading media. These are helpful and quite technical.

CollectiveAccess importing steps

Do all of these steps in a dev environment first to make sure everything is working, then do it for your live site.

  1. Create Excel spreadsheet of metadata to migrate
    • Here’s our example (.xlsx) from when we migrated some digitized photos from an old repo to CA
    • This can be organized however you want, though it may be easiest for each column to be a Dublin Core field. In ours, we have different fields for creators that are individuals vs. organizations.
  2. Create another Excel spreadsheet that will be the “mapping template” aka “importer”
    • Download the starter template (.xlsx) from CA wiki. This whole step is hard to understand, by the way, so set aside some time.
    • Here’s our example (.xlsx), which refers to the metadata spreadsheet above.
    • Every number in the “Source” column refers to the metadata spreadsheet: 1 is column A, 2 is B, …
    • Most of these will be Mapping rules, e.g. if Column A is the title of the object, the rule type would be Mapping, Source would be 1, and CA table element would be ca_objects.preferred_labels
      • Get the table elements from within CA (requires admin account): see Manage → Administration → User interfaces → Your Object Editor [click page icon] → Main Editor [click page icon] → Elements to display on this screen
      • Example row:
        Rule type Source CA table.element Options
        Mapping 9 ca_objects.lcsh {“delimiter”: “||”}
    • Don’t forget to fill out the Settings section below with importer title, etc.
  3. On your local machine, make a folder of the files you want to import
    • Filenames should be the same as identifiers in metadata sheet. This is how CA knows which files to attach to which metadata records
    • Only the primary media representations should be in this folder. Put secondary files (e.g., scan of the back of a photograph) should be in a different folder. These must be added manually, as far as I know.
  4. Upload the folder of items to import to pawtucket/admin/import.
    • Perform chmod 744 to all items inside the folder once you’ve done this, otherwise you’ll get an “unknown media type” error later.
  5. (Metadata import) In CA, go to Import → Data, upload the mapping template, and click the green arrow button. Select the metadata spreadsheet as the data format
    • “Dry run” may actually import (bug in v. 1.4, resolved in later version?). So again, try this in dev first.
    • Select “Debugging output” so if there’s an error, you’ll see what’s wrong
    • This step creates objects that have their metadata all filled out, but no media representations.
    • Imported successfully? Look everything over.
  6. (Connect uploaded media to metadata records) In CA, go to Import → Select the directory from step 5.
    • “Import all media, matching with existing records where possible.”
    • “Create set ____ with imported media.”
    • Put object status as inaccessible, media representation access as accessible — so that you have a chance to look everything over before it’s public. (As far as I know, it’s easy to batch-edit object access, but hard to batch-edit media access)
    • On the next screen, CA will slowly import your items. Guesstimate 1.5 minutes for every item. Don’t navigate away from this screen.
  7. Navigate to the set you just created and spot-check all items.
    • Batch-edit all objects to accessible to public when satisfied
  8. Add secondary representations manually where needed.

You may need to create multiple metadata spreadsheets and mapping templates if you’re importing a complex database. For instance, for trial transcripts that had multiple kinds of relationships with multiple entities, we just did 5 different metadata imports that tacked more metadata onto existing objects, rather than creating one monster metadata import.

You can switch steps 5 and 6 if you want, I believe, though since 5 is easy to look over and 6 takes a long time to do, I prefer my order.

Again, I urge you to try this on your dev instance of CA first (you should totally have a dev/test instance). And let me know if you want to know how to batch-delete items.

Good luck!

Leave a Reply

Your email address will not be published. Required fields are marked *

To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image