Methodology and Architecture
Within the DLBT, iConTxt introduces an AI layer that enhances and partially automates editorial workflows.
The methodological framework combines data import and quality control, metadata enhancement, and contextual knowledge generation:
- Data Import and Validation: Automated procedures verify bibliographical records during upload, identify inconsistencies, and detect duplicate entries. Integrated OCR correction modules improve the quality of digitized texts and facilitate full-text search.
- Metadata Enrichment: The system is used to refine and expand metadata by linking authors, translators, publishers, and works. It cross-references data with external authority files such as VIAF[1], GND[2], and Wikidata, ensuring semantic consistency and enabling multilingual interoperability.
- Generation of iConTxtInfo Packages: In the first stage, iConTxt workflow leverages artificial intelligence models to process individual reception documents and generate structured data units designated as "iConTxtInfo packages". Each package contains an English-language summary, central question, and thematic keywords, forming a unified knowledge base that improves accessibility and facilitates cross-document comparison. This database serves as the foundation for further AI-driven contextualization and research.
- Contextual Knowledge Generation: In the second stage, a Retrieval-Augmented Generation (RAG)[3] pipeline combines the knowledge base with DLBT metadata, and external data sources such as selected web resources, Wikipedia, Wikidata, and specialized literary lexica, to generate new textual information. This approach ensures that generated information remains grounded in verifiable sources rather than purely model-generated inference. To ensure that newly digitized documents and metadata updates are continuously incorporated, these AI-generated texts will be regenerated every six months, a strategy that also reduces the ecological footprint by limiting large-scale model usage to periodic update cycles.
This architecture allows iConTxt to function as a hybrid system, combining automated data processing with human editorial oversight. The iterative workflow ensures transparency, reproducibility, and long-term sustainability, which are all essential criteria for the integration of AI technologies in digital humanities infrastructures.
[1] VIAF (Virtual International Authority File): An international authority file that combines multiple national library authority files into a single service.
[2] GND (Gemeinsame Normdatei): The Integrated Authority File used by German-speaking libraries for standardized cataloging of persons, corporate bodies, conferences, geographic entities, topics, and works.
[3]RAG (Retrieval-Augmented Generation): An AI technique that combines information retrieval with text generation, allowing AI models to access external knowledge bases before generating responses, improving accuracy and reducing hallucinations.