Visualizing Library Collections

Lessons and Challenges from the Artists' Book Collection,

Frick Fine Arts Library, University of Pittsburgh

Anna-Sophia Zingarelli-Sweet

LIS2975: Digital Scholarship

December 2013

The artists' book collection

Started at the Frick Fine Arts Library in the early 1990s
Contains 160+ one of a kind, limited edition, altered, and conceptual artists' books

Jeffrey Morin, Sacred Space, 2002

Why Visualize?

The collection was developed under the auspices of the former Head Librarian, Ray Anne Lockard, now retiring
There is no formal collection policy
No other staff member has worked closely with developing the collection or has specialized knowledge of artists' books
This project proposes that visualizing the existing collection can help identify its strengths and weaknesses, trends and focal points, to inform future collection development and potential establishment of a collection policy

Challenges

Incomplete inventory:

-- The only collection-specific inventory is in a word processing file with bibliographic data and a short description of each work. While of some use, it is incomplete, never updated, and riddled with basic errors

Inconsistent cataloging:

--Artists' books are cataloged by the ULS's English-language cataloger, who has no specialized knowledge or training. Consequently, the cataloging varies widely and not all artists' books can be identified in PittCat by subject heading.

Bottom line:

There is no single place where the entire collection is identified or described as a coherent whole.

Creation of the Data Set

Trial and error!
Found no satisfactory way to automate the export of bibliographic data to a spreadsheet from the library's OPAC
First try: Exported data as *.csv from OPAC search result table
Exported only OCLC numbers, not bibliographic display

...not too useful.

Ultimately saved bibliographic data from subject heading searches as *.txt files, then compared to existing inventory
Imported to Excel
Manually reformatted into desired columns

Identified 167 artists' books
May still be incomplete, due to cataloging inconsistencies
Data set includes title, author, publisher, date of publication, place of publication, and up to 9 Library of Congress Subject Headings
Approximately 70 books identified using subject search "Artists' book" + location search "Fine Arts"
Remaining books identified using existing inventory; subject headings were then added by searching the OPAC, revealing that many books were not classified with LCSH "Artists' book", but with geographically specific and other subject headings

Recommendations

As digital tools develop further to add value and meaning to collections, libraries should make bibliographic data more easily exportable to file formats that can be used to visualize and explore (eg *.csv, *.xlsx, *.gexf)
Special collections should engage in cataloging practices that more effectively and consistently "tag" items to be searched as a coherent collection (eg use "Artists' book" as primary subject heading for all collection items before geographically subdividing)

Visualization goals

Identify tools that would reveal hidden relationships and trends within the collection items

mapping publication locations
subject/keyword word clouds
author relationship networks
timeline

Seed ideas for a digital exhibition of works from the collection

Secondary goal: to demonstrate value of digital tools for traditional library functions (like collection development) to library leadership

GEPHI

First turned to Gephi due to positive feedback from peers
Downloaded free version 0.8.2-beta

Attempted to upload as *.csv (converted from *.xlsx spreadsheet), but Gephi interpreted each word in the document as a separate node, resulting in thousands of decontextualized nodes
Not apparent how my data could be transformed into other formats supported by Gephi (GDF, GML, GEXF)
May be usable with additional time spent marking up data and watching tutorials, but was not easily accessible for the typical humanities librarian

ManyEyes

An IBM Research tool
Web based; not compatible with all browsers
Offers numerous visualization templates to be used with uploaded data
All datasets and visualizations are public, can be reused
Data can be cut and pasted from any table format into ManyEyes template - very user friendly
Not flexible with all types of data (eg dates are read as quantities: "2,013" instead of 2013)
Does not readily handle even moderately large data sets; was difficult to visualize 167 titles, 115 authors
Relies on Java plugin that crashes frequently

Uploaded 3 datasets with variations of my spreadsheet
Used to create 8 visualizations, of varying utility
Conveniently accessible in personal online profile

Growth of Collection By Date

Scatter graph plotting the number of works by date
Each dot is a title; mouse over to view title and date
First item is 1967; most recent is 2013
Suggests that collection is stronger in recent works

Cannot zoom in to view subtle changes in overall trend
Dates are treated as quantities, causing confusion

Prominence of Author + Location

Size of bubble corresponds to number of author's work in collection (sort of...)

Since dates are treated as quantities, the size of the bubble reflects the sum of the dates, rather than the actual number of instances. It still successfully identifies more prominent authors, since those with more items in the collection have larger date sums, but is confusing and over-represents more recent authors
Color of bubble corresponds to publication location. This is suggestive of potential geography-based author networks, and reveals when an author has published in more than one place

Treemap hierarchy: Date and Location

Each square represents an item
Full bibliographic data is revealed by hovering over squares
Collocated by place of publication
Color represents date of publication; lighter color indicates older, darker color indicates more recent
Indirectly displays prominence of publication location
Could show trends in location of publishing (eg works from New York are older; works from Portland are more recent)

Treemap Hierarchy: Author and Title

Collocated by author
Another method of showing author prominence
Subdivided by title
Full bibliographic data revealed by hovering over each square

Matrix Chart: Location

In this iteration, shows prominence of each location and the authors associated with location; can also show the publishers associated with location
Particularly unwieldy for a large dataset

MAPS: Place of Publication (Us)

Shows in which states collection items have been published
Was only able to choose one representative item for each state - all subsequent items from an already-selected state had to be manually "ignored"
Time consuming to comb through all location data to identify new states
Does not reflect relative prominence of each state or collocate the various titles from each state
World maps and maps of some states/provinces by county also available

Word Cloud 1

Word cloud compiled from primary subject headings
Not manipulated
Unable to treat each subject heading as a coherent phrase; split up into 2-word phrases
Reveals most common primary subject headings, but reveals almost nothing about topical content of the items
A few themes do emerge (eg "September 11")
Searchable

Word Cloud 2

Drawn from all 9 columns of subject headings
Heavily manipulated to remove most common, non-descriptive headings and attempt to reveal topical content
"Banned words": and, art, Art, Artists, books, books--New, books--Specimens, books--United, in, of, State, States, States--Specimens, the, to
Also treats each word separately, not entire heading
Not searchable or zoomable

Click2Map

Free service that enables plotting locations from a file onto a Google map
Not visually appealing
Does not allow for association of other data with location points
Was not readily able to find other similar services
Easy to upload: cut and pasted location column into a new spreadsheet and saved as *.csv
Uploaded directly to map editor; automatically assigned latitude and longitude data
Searchable and zoomable
Free to create and download, but costly to publish online

SHODOR

Simple histogram generator
Can be printed, but not published online
Shows frequency of publication date:

Conclusions

Despite proliferation of free online tools, few are accommodating to the types of data and relationships relevant to analyzing library collections and accessible to average humanities librarians
Even these flawed visualizations, however, hint at the possibilities of visualizing this kind of data
Existing tools should be made more flexible to accommodate dates, key phrases, and larger data sets
Possible market for new specialized tools geared towards librarians and library collections

Visualizing Library Collections

By Anna-Sophia Zingarelli-Sweet

Visualizing Library Collections

3,184

Anna-Sophia Zingarelli-Sweet

cultural heritage + metadata + climate resilience

Visualizing Library Collections

The artists' book collection

Why Visualize?

Challenges

Creation of the Data Set

Recommendations

Visualization goals

Treemap hierarchy: Date and Location

Treemap Hierarchy: Author and Title

Matrix Chart: Location

MAPS: Place of Publication (Us)

Conclusions

Visualizing Library Collections

More from Anna-Sophia Zingarelli-Sweet