Department of Information Technology

q2b – From Quill to Bytes

q2b

Description

This cross disciplinary initiative takes its point of departure in the analysis of handwritten text manuscripts using computational methods from image analysis and linguistics. It sets out to develop a manuscript analysis technology providing automatic tools for large-scale transcription, linguistic analysis, digital paleography and generic data mining of historical manuscripts. Our mission is to develop technology that will push the digital horizon back in time, by enabling digital analysis of handwritten historical materials for both researchers and the public.

Funding

2016 – 2021: Riksbankens Jubileumsfond, Jubileumsutlysningen Nya utsikter för humaniora och samhällsvetenskap, 13.4 MSEK. New Eyes on Sweden s Medieval Scribes. Scribal Attribution using Digital Palaeography in the Medieval Gothic Script. (Dnr NHS14-2068:1, PI Lasse Mårtensson)
2013 – 2017: Vetenskapsrådet, framework grant Searching and datamining in Large Collections of Historical Handwritten Documents, 13.7 MSEK framework grant (Dnr 2012-5743, PI Anders Brun).
2011 — 2015: Support from Rector of Uppsala University
2009 — Support from Historisk-filosofiska fakulteten, Språkvetenskapliga fakulteten, Teknisk-Naturvetenskapliga fakulteten, Department of Linguistics and Philology, Department of Information Technology, Department of History. Originally this was organized by EPARIT and later by SALT.

People

  • Dr. Anders Brun, Researcher (PI), [1]
  • Dr. Mats Dahllöf, associate professor
  • Dr. Alicia Fornés, Universitat Autònoma de Barcelona
  • Dr. Anders Hast, associate professor
  • Dr. Jonas Lindström, Researcher
  • Dr. Lasse Mårtensson (PI), associate professor [2]
  • Dr. Carl Nettelblad, Associate Senior Lecturer
  • Dr. Ekta Vats, Postdoc
  • Kalyan Ram, PhD student
  • Fredrik Wahlberg, PhD student
  • Tomas Wilkinson, PhD student

Previous Staff and Alumni

  • Carl Carenwall, thesis worker
  • Luis Hermosa Santos, thesis worker
  • Bojana Simsic, project assistant

Talks and Science Outreach

Lecture at Ghent University, Statistical Tools and Methods and their applications in Image Processing, Computer Vision and HTR, Anders Hast, 2017-05-04.
Lecture at Ghent University, Statistical Tools and Methods and their applications in Image Processing, Computer Vision and HTR, Anders Hast, 2016-05-04.
Times Higher Education, Illuminating manuscripts for the digital age, 2015-06-18, https://www.timeshighereducation.com/news/illuminating-manuscripts-digital-age
Rötter, Snart kan googling av handskrifter bli verklighet, 2015-04-16, http://www.rotter.se/123-nyheter/2013/1415-googling-av-handskrifter-kan-vara-moejligt-snart
Uppsala University news, Ett Google för Handskrifter, 2015-04-08. http://www.uu.se/press/nyheter/artikel/?id=4386.
Uppsala Unviersity news, A Google for Handwriting, 2015-04-28. http://www.uu.se/en/media/news/article/?id=4574&typ=artikel&area=10&lang=en
Digikult 2015, Storskalig datautvinning från historiska handskrivna texter, Anders Brun. https://youtu.be/4AZa9JRSNtk
Mobilising the National Heritage
Symposium on the Digitization of Swedish Natural
History Collections, Challenges and advances in OCR and handwriting
recognition, Anders Brun, 2014-01-09.
Att synliggöra verkligheten, interview with Fredrik Wahlberg and Tomas Wilkinson, Kunskapskanalen, 10/11 2013.
http://urskola.se/Produkter/177471-UR-Samtiden-Tema-Att-synliggora-verkligheten
Project showcase at Bok- och Bibliotek, Göteborg Book Fair 2013, http://www.bokmassan.se
Från texter till bilder, Magasin Ping nr 5, 2013. http://www.dik.se/nyheter/fraan-texter-till-bilder/

Publications

E. Vats, A. Hast, On-the-fly Historical Handwritten Text Annotation, To appear in the Proceedings of International Workshop on Human-Document Interaction, 2017.

E. Vats, A. Hast, P. Singh, Automatic Document Image Binarization using Bayesian Optimization, To appear in the Proceedings of International Workshop on Historical Document Imaging and Processing, ACM Digital Library, 2017.

A. Hast, P. Cullhed, E. Vats, TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation, To appear in the Proceedings of the Italian Research Conference on Digital Libraries (IRCDL), 2017.

A. Hast, A. Fornés, A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching, 12th IAPR International Workshop on Document Analysis Systems (DAS), 2016.

F. Wahlberg, L. Mårtensson, A. Brun, Large scale continuous dating of medieval scribes using a combined image and language model, 12th IAPR International Workshop on Document Analysis Systems (DAS), 2016.

K. Ayyalasomayajula, A. Brun, Topological clustering guided document binarization, Proceedings of SSBA, 2015.

F. Wahlberg, L. Mårtensson, A. Brun, Large scale style based dating of medieval manuscripts, Proc. 3rd International Workshop on Historical Document Imaging and Processing, ACM Digital Library, 2015.

F. Wahlberg, L. Mårtensson, A. Brun, Writer identification using the Quill-Curvature feature in old manuscripts, Proceedings of SSBA, 2015.

T. Wilkinson, A. Brun, Visualizing document image collections using image-based word clouds, Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015.

T. Wilkinson, A. Brun, A novel word segmentation method based on object detection and deep learning, Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015.

T. Wilkinson, A. Brun, Experiments on Large Scale Document Visualization using Image-based Word Clouds, Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203, 2015.

M. Dahllöf, Predicting the Scribe Behind a Page of Medieval Handwriting, SLTC 2014, The Fifth Swedish Language Technology Conference, 2014.

M. Dahllöf, Scribe Attribution for Early Medieval Handwriting by Means of Letter Extraction and Classification and a Voting Procedure for Larger Pieces, ICPR 2014.

K. R. Ayyalasomayajula, A. Brun, Document binarization using topological clustering guided Laplacian Energy Segmentation, ICFHR 2014.

F. Wahlberg, L. Mårtensson, A. Brun, Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram, ICFHR, 2014.

F. Wahlberg, M. Dahllöf, L. Mårtensson, A. Brun, Spotting Words in Medieval Manuscripts, Studia Neophilologica, ISSN 0039-3274, Vol. 86, 171-186, 2014.

F. Wahlberg and A. Brun, Feature Space Denoising Improves Word Spotting, accepted for the HIP Workshop (ICDAR), 2013.

F. Wahlberg and A. Brun, Feature weight optimization and pruning in historical text recognition, Proceedings of ISVC, 2013.

C. Carenwall, Adaptive binarization of 17th century printed text, student thesis, Uppsala universitet, 2012.

F. Wahlberg and A. Brun. Graph based line segmentation on cluttered handwritten manuscripts, Proceedings of ICPR, 2012.

F. Wahlberg, M. Dahllöf, L. Mårtensson, and A. Brun. Word Spotting in Pre-Modern Manuscripts using Dynamic Time Warping. Proceedings of SSBA 2012.

F. Wahlberg, M. Dahllöf, L. Mårtensson, and A. Brun. Data mining medieval documents by word spotting. In Proc. of Historical Document Imaging and Processing (HIP) 2011, ACM International Conference Proceedings Series, 2011.

Related Publications

Eva Pettersson, Beáta Megyesi och Joakim Nivre (2012), Rule-based normalisation of historical text - A diachronic study, Proceedings of the First international Workshop on Language Technology for Historical Text(s). KONVENS, Vienna, Austria, September 2012.

Eva Pettersson, Beáta Megyesi och Joakim Nivre (2012), Parsing the Past - Identification of Verb Constructions in Historical Tex, Proceedings of the 6th EACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Avignon, France, April 2012.

Eva Pettersson och Joakim Nivre (2011), Automatic Verb Extraction from Historical Swedish Texts, Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Portland, OR, USA, Juni 2011.

OCR-läsning av äldre källmaterial -- Vad kan (och bör) man göra?
Jonas Lindström, Anders Brun, Bengt Dahlqvist,
2009

Den svenska statskyrkans grunddokument -- Dokumentation av en digitaliseringsprocess
Jonas Lindström, 2010.

Automatic verb extraction from historical Swedish texts
Eva Pettersson, May 12, 2010.

Sökbarhet i digitaliserade dokument -- Metoder och överväganden
Bengt Dahlqvist, 10 maj 2010.

J. Lindström, A. Brun, and B. Dahlqvist. OCR-läsning av äldre källmaterial - vad kan (och bör) man göra? Technical report, SALT - Studies in Art, Languages and Theology, Uppsala University, 2010.

Uppsala university research-, education- and innovation strategies strategies 2013 - 2016. Dnr UFV 2011/133, November 18, 2011.

Updated  2017-10-17 16:08:55 by Anders Hast.