Skip to main content
Department of Information Technology

q2b – From Quill to Bytes

q2b

Description

This cross disciplinary initiative takes its point of departure in the analysis of handwritten text manuscripts using computational methods from image analysis and linguistics. It sets out to develop a manuscript analysis technology providing automatic tools for large-scale transcription, linguistic analysis, digital paleography and generic data mining of historical manuscripts. Our mission is to develop technology that will push the digital horizon back in time, by enabling digital analysis of handwritten historical materials for both researchers and the public.

Funding

2016 – 2021: Riksbankens Jubileumsfond, Jubileumsutlysningen Nya utsikter för humaniora och samhällsvetenskap, 13.4 MSEK. New Eyes on Sweden s Medieval Scribes. Scribal Attribution using Digital Palaeography in the Medieval Gothic Script. (Dnr NHS14-2068:1, PI Lasse Mårtensson)
2013 – 2017: Vetenskapsrådet, framework grant Searching and datamining in Large Collections of Historical Handwritten Documents, 13.7 MSEK framework grant (Dnr 2012-5743, PI Anders Brun).
2017 - 2019: Swedish e-Science Academy (eSSENCE), an e-science project on automatic handwritten text recognition (PI Anders Hast).
2011 — 2015: Support from Rector of Uppsala University
2009 — Support from Historisk-filosofiska fakulteten, Språkvetenskapliga fakulteten, Teknisk-Naturvetenskapliga fakulteten, Department of Linguistics and Philology, Department of Information Technology, Department of History. Originally this was organized by EPARIT and later by SALT.

People

  • Dr. Anders Brun, Researcher (PI), [1]
  • Dr. Lasse Mårtensson (PI), associate professor [2]
  • Dr. Anders Hast, associate professor
  • Dr. Mats Dahllöf, associate professor
  • Dr. Alicia Fornés, Universitat Autònoma de Barcelona
  • Dr. Carl Nettelblad, Associate Senior Lecturer
  • Dr. Jonas Lindström, Researcher
  • Dr. Ekta Vats, Postdoc
  • Dr. Fredrik Wahlberg, Postdoc
  • Dr. Sukalpa Chanda, Postdoc
  • Kalyan Ram, PhD student
  • Tomas Wilkinson, PhD student
  • Raphaela Heil, PhD student

Previous Staff and Alumni

  • Carl Carenwall, thesis worker
  • Luis Hermosa Santos, thesis worker
  • Bojana Simsic, project assistant

Talks and Science Outreach

Making document collections searchable and readable by using handwritten text recognition techniques – possibilities and limitations, Workshop on "Automated Registration of Historical Population Registers: New Prospects and Possibilities", Anders Hast, Lund, 2019-02-14.
Introductory tutorial to deep learning, Kalyan Ram, HTR workshop 2018, Noor Slott, https://drive.google.com/drive/folders/1x5RmuyCurdsZs0bFy0vkQS17paXTzPoz?usp=sharing
Lecture at Ghent University, Statistical Tools and Methods and their applications in Image Processing, Computer Vision and HTR, Anders Hast, 2017-05-04.
Lecture at Ghent University, Statistical Tools and Methods and their applications in Image Processing, Computer Vision and HTR, Anders Hast, 2016-05-04.
Times Higher Education, Illuminating manuscripts for the digital age, 2015-06-18, https://www.timeshighereducation.com/news/illuminating-manuscripts-digital-age
Rötter, Snart kan googling av handskrifter bli verklighet, 2015-04-16, http://www.rotter.se/123-nyheter/2013/1415-googling-av-handskrifter-kan-vara-moejligt-snart
Uppsala University news, Ett Google för Handskrifter, 2015-04-08. http://www.uu.se/press/nyheter/artikel/?id=4386.
Uppsala Unviersity news, A Google for Handwriting, 2015-04-28. http://www.uu.se/en/media/news/article/?id=4574&typ=artikel&area=10&lang=en
Digikult 2015, Storskalig datautvinning från historiska handskrivna texter, Anders Brun. https://youtu.be/4AZa9JRSNtk
Mobilising the National Heritage
Symposium on the Digitization of Swedish Natural
History Collections, Challenges and advances in OCR and handwriting
recognition, Anders Brun, 2014-01-09.
Att synliggöra verkligheten, interview with Fredrik Wahlberg and Tomas Wilkinson, Kunskapskanalen, 10/11 2013.
http://urskola.se/Produkter/177471-UR-Samtiden-Tema-Att-synliggora-verkligheten
Project showcase at Bok- och Bibliotek, Göteborg Book Fair 2013, http://www.bokmassan.se
Från texter till bilder, Magasin Ping nr 5, 2013. http://www.dik.se/nyheter/fraan-texter-till-bilder/

Publications

E. Vats, A. Hast, A. Fornés, Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion, International Conference on Document Analysis and Recognition (ICDAR), 2019. [3]

A. Hast, M. Lind, E. Vats, Embedded Prototype Subspace Classification: A subspace learning framework. In: Vento M., Percannella G. (eds) Computer Analysis of Images and Patterns, CAIP 2019, Lecture Notes in Computer Science, vol 11679, Springer, Cham, 2019. [4]

A. Hast, M. Lind, E. Vats, Subspace Learning and Classification, 3rd Swedish Symposium on Deep Learning (SSDL), 2019.

L. Mårtensson, E. Vats A. Hast, A. Fornés, In Search of the Scribe. Letter Spotting as a Tool for Identifying Scribes in Large Handwritten Text Corpora, in HUMAN IT: Nordic Digital Humanities Journal, 2019. [5]

A. Hast, P. Cullhed, E. Vats, M. Abrate, Making Large Collections of Handwritten Material Easily Accessible and Searchable, In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science, IRCDL 2019, Communications in Computer and Information Science, vol 988, 2019. [6]

A. Hast, L. Mårtensson, E. Vats, R. Heil, Creating an Atlas over Handwritten Script Signs, Digital Humanities in the Nordic countries (DHN), Copenhagen, 2019. [7]

M. Dahllöf, Clustering writing components from medieval manuscripts. I Michael Piotrowski (red.) Proceedings of the Workshop on Computational Methods in the Humanities 2018, 23-32, 2019.è
[8]

M. Dahllöf, Automatic Scribe Attribution for Medieval Manuscripts. Digital Medievalist, 11(1): 1-26, 2018.
[9]

R. Heil, E. Vats, A. Hast, Exploring the Applicability of Capsule Networks for Word Spotting in Historical Handwritten Manuscripts, 2nd Swedish Symposium on Deep Learning, Gothenburg, 2018.

A. Hast, E. Vats, Radial Line Fourier descriptor for historical handwritten text representation, Journal of WSCG, ISSN 1213-6972, Vol. 26, No. 1, 31-40, 2018. [10]

A. Hast, E. Vats, Radial Line Fourier descriptor for historical handwritten text representation, Proceedings of 26th International Conference on Computer Graphics, Visualization and Computer Vision (WSCG), 2018. [11]

A. Hast, E. Vats, An Intelligent User Interface for Efficient Semi-automatic Transcription of Historical Handwritten Documents, Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion (IUI'08), Tokyo, Japan, ACM, pp. 48:1-48:2, 2018. [12]

P. Singh, E. Vats, A. Hast, Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing, Proceedings of 13th IAPR International Workshop on Document Analysis Systems (DAS), 2018. [13]

E. Vats, A. Hast, L. Mårtensson, Extracting script features from a large corpus of handwritten documents, Proceedings of Digital Humanities in the Nordic countries (DHN), 2018. Extended Abstract.

A. Hast, P. Cullhed, E. Vats, TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation, In: Serra G., Tasso C. (eds) Digital Libraries and Multimedia Archives, IRCDL 2018, Communications in Computer and Information Science, volume 806. Springer, Cham, 2018. [14]

E. Vats, A. Hast, Historical Handwritten Text Recognition, Poster presentation at Swedish e-Science Academy, Umeå, Oct 11-12, 2017.

E. Vats, A. Hast, On-the-fly Historical Handwritten Text Annotation, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017, pp. 10-14, 2017. [15]

E. Vats, A. Hast, P. Singh, Automatic Document Image Binarization using Bayesian Optimization, Proceedings of International Workshop on Historical Document Imaging and Processing (HIP2017), ACM Digital Library, New York, USA, pp. 89-94, 2017. [16]

A. Hast, A. Fornés, A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching, 12th IAPR International Workshop on Document Analysis Systems (DAS), 2016.

F. Wahlberg, L. Mårtensson, A. Brun, Large scale continuous dating of medieval scribes using a combined image and language model, 12th IAPR International Workshop on Document Analysis Systems (DAS), 2016.

K. Ayyalasomayajula, A. Brun, Topological clustering guided document binarization, Proceedings of SSBA, 2015.

F. Wahlberg, L. Mårtensson, A. Brun, Large scale style based dating of medieval manuscripts, Proc. 3rd International Workshop on Historical Document Imaging and Processing, ACM Digital Library, 2015.

F. Wahlberg, L. Mårtensson, A. Brun, Writer identification using the Quill-Curvature feature in old manuscripts, Proceedings of SSBA, 2015.

T. Wilkinson, A. Brun, Visualizing document image collections using image-based word clouds, Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015.

T. Wilkinson, A. Brun, A novel word segmentation method based on object detection and deep learning, Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015.

T. Wilkinson, A. Brun, Experiments on Large Scale Document Visualization using Image-based Word Clouds, Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203, 2015.

M. Dahllöf, Predicting the Scribe Behind a Page of Medieval Handwriting, SLTC 2014, The Fifth Swedish Language Technology Conference, 2014.

M. Dahllöf, Scribe Attribution for Early Medieval Handwriting by Means of Letter Extraction and Classification and a Voting Procedure for Larger Pieces, ICPR 2014.

K. R. Ayyalasomayajula, A. Brun, Document binarization using topological clustering guided Laplacian Energy Segmentation, ICFHR 2014.

F. Wahlberg, L. Mårtensson, A. Brun, Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram, ICFHR, 2014.

F. Wahlberg, M. Dahllöf, L. Mårtensson, A. Brun, Spotting Words in Medieval Manuscripts, Studia Neophilologica, ISSN 0039-3274, Vol. 86, 171-186, 2014.

F. Wahlberg and A. Brun, Feature Space Denoising Improves Word Spotting, accepted for the HIP Workshop (ICDAR), 2013.

F. Wahlberg and A. Brun, Feature weight optimization and pruning in historical text recognition, Proceedings of ISVC, 2013.

C. Carenwall, Adaptive binarization of 17th century printed text, student thesis, Uppsala universitet, 2012.

F. Wahlberg and A. Brun. Graph based line segmentation on cluttered handwritten manuscripts, Proceedings of ICPR, 2012.

F. Wahlberg, M. Dahllöf, L. Mårtensson, and A. Brun. Word Spotting in Pre-Modern Manuscripts using Dynamic Time Warping. Proceedings of SSBA 2012.

F. Wahlberg, M. Dahllöf, L. Mårtensson, and A. Brun. Data mining medieval documents by word spotting. In Proc. of Historical Document Imaging and Processing (HIP) 2011, ACM International Conference Proceedings Series, 2011.

Related Publications

Eva Pettersson, Beáta Megyesi och Joakim Nivre (2012), Rule-based normalisation of historical text - A diachronic study, Proceedings of the First international Workshop on Language Technology for Historical Text(s). KONVENS, Vienna, Austria, September 2012.

Eva Pettersson, Beáta Megyesi och Joakim Nivre (2012), Parsing the Past - Identification of Verb Constructions in Historical Tex, Proceedings of the 6th EACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Avignon, France, April 2012.

Eva Pettersson och Joakim Nivre (2011), Automatic Verb Extraction from Historical Swedish Texts, Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Portland, OR, USA, Juni 2011.

OCR-läsning av äldre källmaterial -- Vad kan (och bör) man göra?
Jonas Lindström, Anders Brun, Bengt Dahlqvist,
2009

Den svenska statskyrkans grunddokument -- Dokumentation av en digitaliseringsprocess
Jonas Lindström, 2010.

Automatic verb extraction from historical Swedish texts
Eva Pettersson, May 12, 2010.

Sökbarhet i digitaliserade dokument -- Metoder och överväganden
Bengt Dahlqvist, 10 maj 2010.

J. Lindström, A. Brun, and B. Dahlqvist. OCR-läsning av äldre källmaterial - vad kan (och bör) man göra? Technical report, SALT - Studies in Art, Languages and Theology, Uppsala University, 2010.

Uppsala university research-, education- and innovation strategies strategies 2013 - 2016. Dnr UFV 2011/133, November 18, 2011.

Updated  2020-02-27 19:03:08 by Ekta Vats.