How AI Will Transform Library Cataloging

As digital collections expand exponentially, libraries face an overwhelming challenge: cataloging materials faster than they can accumulate them. The Library of Congress is exploring whether artificial intelligence can help catalogers by automating metadata generation and speeding up description workflows (1). This technological shift promises to revolutionize how libraries organize and provide access to information.

The Promise of AI-Powered Cataloging

The potential benefits of AI in library cataloging are substantial. Artificial intelligence can automate and potentially improve cataloging, which is a significant area in librarianship (2). By handling repetitive tasks, AI could free librarians to focus on more complex work requiring human expertise and judgment.

Detail from an Assisted Cataloging HITL Workflow Prototype for evaluating the quality of suggested subject terms. There are options to mark ”Acceptable”, ”Too Broad”, ”Too Narrow”, ”Wrong,” and to add comments for each suggested subject term.

In experiments with approximately 23,000 ebooks, the Library of Congress tested five open-source machine learning models to predict required metadata such as titles, authors, subjects, genres, dates, and identifiers (1). The results were mixed but instructive. Transformer-based models showed particular success with token classification tasks like predicting titles and authors. However, none reached the Library’s quality threshold of 95% accuracy except for identifying Library of Congress Control Numbers.

Subject classification proved especially challenging. The Annif model, developed by the National Library of Finland for automated subject indexing, achieved only a 35% accuracy rate in classifying subjects from text (1). Large Language Models fared better in some areas, with specific fields reaching 90% accuracy. Still, LLMs scored only 26% when asked to predict the same Library of Congress Subject Headings terms that catalogers assigned (1).

The Human-in-the-Loop Approach

These results illuminate a crucial reality: AI will augment rather than replace human catalogers. Participants at a 2024 forum on AI and metadata management emphasized that AI should support, not replace, library professionals’ expertise, stressing the importance of human judgment in maintaining quality and integrity (3).

The Library of Congress has embraced this philosophy through human-in-the-loop (HITL) workflows. Two prototypes were developed where machine learning suggests possible Library of Congress Subject Headings and author names from the Name Authority File, which catalogers then review and select from (1). This collaborative approach leverages AI’s processing power while preserving the nuanced decision-making that defines professional cataloging.

Catalogers who tested the HITL prototypes showed curiosity and an open mind, particularly appreciating how machine learning could augment and support their work (1). This positive reception suggests that librarians recognize AI as a tool for enhancement rather than a threat to their profession.

Significant Challenges Remain

A computer used to find books and media at Litchfield Park Library on April 18, 2024. Integrating artificial intelligence into library services may change the way information is retrieved and categorized.

The path to widespread AI adoption in cataloging faces substantial obstacles. The Library of Congress discovered that starting with only 23,000 ebooks for training data was insufficient, and performance would likely improve with the whole corpus of 100,000 available ebooks (1). This “more is more” principle for training data presents challenges for smaller institutions with limited resources.

The complexity of cataloging tasks themselves poses another barrier. Selecting the correct string of subjects for a book from approximately 450,000 possible subjects is incredibly challenging for both humans and AI (1). Bibliographic records can include multiple subject fields with multiple terms each, and about half of the subject terms in training data appeared only once, creating what experts call “extreme multilabel text classification.”

Beyond technical challenges, practical concerns loom large. Terms of service for some AI models aren’t compliant with federal security and privacy regulations, particularly regarding how federal agencies must treat data (1). Libraries must also consider long-term costs, infrastructure requirements, and environmental impacts as they evaluate AI adoption.

The Road Ahead

Despite these challenges, the trajectory is clear. The Library of Congress began a third phase of experiments in August 2024, testing machine learning earlier in the pre-publication workflow to create bibliographic metadata for both print and ebooks, with plans to involve other institutions in assessment and review (1). This collaborative, phased approach reflects the careful implementation needed for such a fundamental shift in library operations.

The success of AI in library cataloging will depend on developing robust infrastructure for managing, testing, and updating models; creating balanced training datasets that reflect the diversity of cataloged content; establishing new quality standards and review processes; and training staff to work effectively with these new tools. Implementation will require time and resources, including feedback mechanisms from catalogers and users to inform model performance and systems to manage all associated data (1).

Libraries have historically adapted to technological change, from creating machine-readable MARC formats to pioneering digital preservation programs. AI represents the next chapter in this evolution. While wholesale automation of bibliographic description remains a distant prospect, AI-assisted cataloging workflows that combine machine efficiency with human expertise are already emerging. The future of library cataloging will be neither fully automated nor unchanged—it will be a thoughtful synthesis of artificial and human intelligence, each contributing what it does best to ensure quality access to information.

 

References:

  1. Weinryb-Grohsgal, L., Potter, A., & Saccucci, C. (2024, November). Could Artificial Intelligence Help Catalog Thousands of Digital Library Books? An Interview with Abigail Potter and Caroline Saccucci. The Signal. Library of Congress. https://blogs.loc.gov/thesignal/2024/11/could-artificial-intelligence-help-catalog-thousands-of-digital-library-books-an-interview-with-abigail-potter-and-caroline-saccucci/ 
  2. Cronkite News. (2024, May 7). How AI may impact libraries, research, and information retrieval. https://cronkitenews.azpbs.org/2024/05/07/ai-library-research-cataloging-information-retrieval-artificial-intelligence/
  3. ALA Core News. (2024, October 24). AI Insights: Revolutionizing Metadata Management. https://alacorenews.org/2024/10/24/ai-insights-revolutionizing-metadata-management/