We use EZproxy to provide off-campus users with access to subscription resources that require a campus-specific login. Every time a user visits an EZproxy-linked page (mostly by clicking on a link in our list of databaes), that activity is logged. The logs are broken up monthly as either complete (~1 GB for us) or abridged (~10 MB). The complete logs look like this:
We can get pretty good usage stats from the individual database vendors, but with monthly logs like these, why not analyze them yourself? You could do this in Excel, but Python is much more flexible, and also, I’ve already written the script for you. It very hackily analyzes on- vs. off-campus vs. in-library use, as well as student vs. faculty use.
Use it on the command line like so:
python ezp-analysis.py [directory to analyze] [desired output filename.csv]
The script will spit out a CSV that looks like this:
With which you can then do as you please.
- “Sessions” are different from “connections.” Sessions are when someone logs into EZproxy and does several things; a connection is a single HTTP request. Sessions can only be tracked if they’re off-campus, as they rely on a session ID. On-campus EZproxy use doesn’t get a session ID and so can only be tracked with connections, which are less useful. On-campus use doesn’t tell us anything about student vs. faculty use, for instance.
- Make sure to change the IP address specifications within the script. As it is, it counts “on campus” as IP addresses beginning with “10.” and in-library as beginning with “10.11.” or “10.12.”
- This is a pretty hacky script. I make no guarantees as to the accuracy of this script. Go over it with a fine-toothed comb and make sure your output lines up with what you see in your other data sources.
- This output is only a sketch of how EZproxy may be used on your campus.
- Please take a good look at the logs you’re analyzing and familiarize yourself with them — otherwise you may get the wrong idea about this script’s output!
Preliminary findings at John Jay
Variance: Library resource use via EZproxy varies widely month to month. Compared to January, we get over 3x the use of library resources in November. The data follows the flow of the school year.
Students vs. faculty: When school is in session, EZproxy use is 90% students and 10% faculty. When school is not in session, those percentages pretty much flip around.
Session count: At first, it seems very awesome that we’re getting 11,500 off-campus sessions in our busy months! But then I realize we have over 15,000 students. Hm. Not quite as awesome.
No connection increase: Between 2008 and 2014, database use off-campus hasn’t really seen a big increase, even though we’ve enrolled more students and now have many more online classes:
Kind of strange, no? Still not sure what to make of the data.
If you have suggestions or questions about the script, please do leave a comment!