I started my academic career in 2000 at the Faculty of Humanities and Social Sciences, University of Zagreb by language research in perspective of information sciences and computational linguistics. In 2005 I moved to the Institute of Croatian Language and Linguistics where I continued working with digitalization matters and corpus linguistics. Although my passion for languages has originated in old lexicography, ancient writings and orthography, during the high school education, the professional career in sociolinguistics started in 2008 with focus in 0rthogrAphy since then.

My core sociolinguistic observation is to study development of literacy (and related ideological frameworks, such as history of language norm) through writing from Antiquity to today. I believe researching orthography can shed valuable light on its role in the (de-)construction of nations and regions. I am open for networking with other colleauges who share similar interests.

My main research interests are sociolinguistics and corpus linguistics with the following subjects:

  • orthography, punctuation, language standardization, spelling reforms
  • language legislation, language policy and planning, language ideology
  • South Slavic and North Germanic language groups
  • lexicography, e-lexicography, terminology
  • language e-literacy, e-learning
  • language technologies
  • methods and ethics in scientific methodology and research

Work experience

  • 2005 – present. Institute of Croatian Language and Linguistics. Senior Associate.
  • 2005 – 2015. Zagreb Polytechnics. Tehničko veleučilište u Zagrebu. Lecturer (since 2008) and senior lecturer (since 2014)
  • March– September 2007. Microsoft Development Centar Serbia & Microsoft Croatia. STE (Software Test Engineer). Development of the handwriting recognition of Croatian for the Microsoft TabletPC technology
  • April 2000 – May 2005. Faculty of Humanities and Social Sciences, University of Zagreb. Department of Information Sciences. Young researcher.
  • 2000. – 2001. Novi Liber (publishing company). Lexicographer. Enciklopedijski rječnik hrvatskoga jezika [Encyclopedic Dictionary of Croatian]
  • 1996. ComputerWorld, bimonthly magazine, and Večernji list, daily newspaper. Proofreader.
  • 1994 – 1996. Matica hrvatska [Matrix Croatica]. Computational processing of dictionary entries of the Pet stoljeća hrvatske književnosti [Five centuries of the Croatian literature] edition



  • 3-12 July 2018. COST Action IS1401 Strengthening Europeans' Capabilities by Establishing The European Literacy Network, Short-term Scientific Mission The role of orthography in the three social psychology theories: proof of concept for the project proposal writing. School of Cultures, Languages and Area Studies, University of Nottingham, United Kingdom


Specializations (workshops, seminars, summer schools, etc., excluding webinars)

Polytechnic courses taught

  • 2004/2005 – 2005/2006. Kultura hrvatskoga jezika [The Croatian Language Culture], Polytechnic of Zagreb, 2 ECTS points. Undergraduate professional study programme
  • 2005/2006 – 2014/2015. Jezik i računalo [Language and Computation], Polytechnic of Zagreb, 2 ECTS points. Undergraduate professional study programme
  • 2008/2009 – 2014/2015. Uvod u (X)HTML i CSS [Introduction to (X)HTML and CSS], Polytechnic of Zagreb, 4 ECTS points. Undergraduate professional study programme

Other courses taught

Lectures, presentations, talks and posters (other than published papers)


  • 20-27 August 2018. Round table presentation at the XVI International Congress of Slavists: The end of WW1 and the Slavic World. Presentation title: Codification of punctuation in the language of Croats and Serbs between the two world wars: experience of language conflicts in creation of future language policies in the countries of the former Yugoslavia. Belgrade, Serbia


Funded research project participations

  • 1 October 2015 – 30. September 2018. SenseHive: Dynamic Crowdsourcing Models for Incremental Construction of lexico-semantic resources, project leader Jan Šnajder, Faculty of Engineering and Computing, University of Zagreb. Funded by Croatian Science Foundation, 710.000 HRK
  • 1 April 2009 – 31 March 2010. Izgradnja hrvatskoga kemijskoga nazivlja [Building of the Croatian Chemical Terminology], project leader Lidija Varga-Defterdarović, Ruđer Bošković Institute. Funded by National Science Foundation of Republic of Croatia, 100.000 HRK
  • 15 January 2008. – 15 January 2009. Hrvatsko strukovno nazivlje – projekt koordinacije [Croatian Specialized Terminology – Coordination project]. Funded by Croatian Science Foundation, 1 million HRK
  • 2007 – 2010. Hrvatsko nazivlje u analitičkoj kemiji [Croatian terminology of analytical chemistry], project leader Marija Kaštelan-Macan, Faculty of Chemical Engineering and Technology, University of Zagreb. zProjekti. Funded by the Ministry of Science, Education and Sports of Republic of Croatia
  • 2007 – 2010. Semantičke mreže i računalna leksikologija [Semantic networks and computational lexicology], project leader Damir Ćavar, University of Zadar and Institute of Croatian Language and Linguistics. Funded by Ministry of science, education and sport of Republic of Croatian, MZOŠ RH 2120920-0930
  • 2005 – 2006. Hrvatska jezična mrežna riznica [Croatian language repository online], project leader Dunja Brozović Rončević, Institute of Croatian Language and linguistics. Funded by Ministry of Science, Education and Sport of Republic of Croatia, MZOŠ RH 0212010
  • 1998 – 2002. Strojno razumijevanje hrvatskoga jezika [Machine understanding of the Croatian language], project leader Zdravko Dovedan, Department od Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb. Funded by Ministry of Science and Technology of Republic of Croatia, MZT RH 0130440
  • Označena baza i sintaktički ustroj hrvatskih rečenica [Annotated database and syntactical structure of sentences in Croatian], project leader Božidar Tepeš, Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb. Funded by Ministry of Science and Technology of Republic of Croatia, MZT RH 0130441
  • Označivanje i prepoznavanje riječi hrvatskoga jezika [Annotation and word recognition in Croatian], project leader Božidar Tepeš, Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb. Funded by Ministry of Science and Technology of Republic of Croatia, MZT RH 130015

Other project participations

Awards and acknowledgments

  • 2013. 'Ivan Filipović' Annual Award of the Ministry of Science and Education of Republic of Croatia in a field of scientific and professional merits for the work on the Croatian Orthographic Manual
  • 1995. Dean's Award of University of Zagreb




Committees, councils, working groups



Manuscript reviews




Stojanov, Tomislav. 2018. Pravopisna aplikacija Školske knjige [Spelling mobile application of Školska knjiga]. Software review. To be printed in: Jezik. Časopis za kulturu hrvatskoga književnog jezika. ISSN 0021-6925. Zagreb: Hrvatsko filološko društvo


Stojanov, Tomislav. 2016. Dijakronijska gledišta o pisanju navodnika i polunavodnika u hrvatskome jeziku [Diachronic aspects of writing of single and double quotation marks in the Croatian language]. In: Filologija 66(2016), ISSN 0449-363X, https://doi.org/10.21857/mnlqgcj7py, pp. 59–101

A paper on diachronic aspects of writing of single and double quotation marks is the first part of the study on quotation characters in the Croatian language. Based on a search of old manuscripts and printed texts written in the Croatian language, primary codification books and secondary literature sources, we tried to present the usage and meaning of the quotation marks. The theoretical framework under which this research has been conducted can be attributed as the grapholinguistic or orthographologic one. The aim was to draw conclusions about their historical development and relationship with today’s standardization practice.
It resulted in 11 various quotation mark pairs, of which six are hapax legomena, and the remaining five of which are present in modern Croatian orthographic handbooks. Although many consider „quotation marks” traditional Croatian quotation mark forms, they are only present after Boranić (1930), who ended 150 years of the continuous use of „quotation marks“ in Croatian orthographic books. As opposed to the first quotation marks, which appeared in Šilobod's Aritmetika (1758), single quotation marks came much later with Kušar (1889). Eight single quotation mark pairs were found, of which two are hapax legomena, with six total meanings.
Twenty-one meanings of quotation marks are described and categorized, of which eighteen are used in Croatian orthographic books from Kratki navuk and Uputjenje (both from 1779) to the Institute of Croatian Language and Linguistics' 2013 Hrvatski pravopis. Croatian orthographic books describe rules for eleven of them in a number of meanings ranging from four (Tutavac and Anić-Silić) to ten (Cipra-Klaić).

Stojanov, Tomislav. 2016. Sinkronijska gledišta o pisanju i normiranju navodnika i polunavodnika u hrvatskome jeziku [Synchronic aspects of writing and standardization of single and double quotation marks in the Croatian language]. In: Jezik. Časopis za kulturu hrvatskoga književnog jezika 63(2–3). ISSN 0021-6925. Zagreb: Hrvatsko filološko društvo, pp. 56–76

A paper on synchronic aspects of writing and standardization of single and double quotation marks is the second and final part of the study on quotation characters in the Croatian language.
Quotation marks are examined from three research perspectives: the orthographic and sociolinguistic perspective, the linguographic and computational perspective, as well as the terminological perspective.
Of the thirty characters in five punctuation subcategories with the feature of a quotation mark in the Unicode system, fifteen of them are Latinic (8 quotation marks and 7 single quotation marks). Croatian orthographic books use six of eight quotation marks („ “ » « ” " plus two graphemes that do not exist in Unicode) and all seven single quotation marks (‚ ‛ ’ ‘ ' › ‹ plus one other non-standardized grapheme).
Two models of nomenclature for the terminological norming of all existing quotation marks are suggested (not only for signs that have been used or are still used in the Croatian language): one that is founded in a graphic, graphemic description, and one that is founded in terminological transparency.
In place of a discussion on the choice of graphemes in the Croatian linguistic norm, all relevant quotation marks and single quotation marks are evaluated by seven criteria (orthographic tradition and continuity, frequency, transparency, legibility, typographic aesthetics, computational acceptance, and distinctiveness), and three normative models are suggested for the Croatian graphemic standard for quotation marks.

Stojanov, Tomislav. 2016. Metodologija pravopisne standardizacije u hrvatskome jeziku [The methodology of the orthographic standardization in the Croatian language]. In: Metodologija i primjena lingvističkih istraživanja. Zbornik radova s međunarodnoga znanstvenoga skupa Hrvatskoga društva za primijenjenu lingvistiku održanoga od 24. do 26. travnja 2015. u Zadru. [Croatian Applied Linguistics Society's conference proceedings. An international conference held on 24-26 April 2015 in Zadar, Croatia]. Edited by Sanda Lucija Udier and Kristina Cergol Kovačević. Zagreb: srednja europa, pp. 19–34

A study on the methodology of orthographic standardization in selected European languages serves as background for a larger picture of the Croatian language situation and can possibly also point to certain solutions when considering the methodology of the orthographic standardization in the Croatian language.
Although a very broad categorization can be established for European orthographic methodologies, six methodological perspectives of orthographic standardization stand out for the present status of the Croatian language with regard to standardization (initiative, authority, acceptance, engagement scope, establishing standard model and authorship), which are insufficiently discussed in domestic literature.
It is stressed that the regulation of orthographic policy by means of laws has a positive impact on the stability of the orthographic standard and that it is not possible to implement high-quality orthographic standardization without a language authority in the community.
It is concluded that the establishment of a regulatory center and the creation of fundamental documents on orthographic planning (the green and the white book, development strategy, orthography dispute resolution, and other) could have a crucial impact on further successful development of Croatian orthographic standardization methodology.

Stojanov, Tomislav. 2015. Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku [Language Historical and Computational Linguistic Aspects of the Descriptions and Norming of Dashes in the Croatian Language]. In: Rasprave Instituta za hrvatski jezik i jezikoslovlje 41/1(2015). ISSN 1331-6745. Zagreb: Institut za hrvatski jezik i jezikoslovlje, pp. 127–161

This paper describes one of two punctuation marks (dashes and quotation marks) that deviate significantly from the relationship of one character per (Unicode) semantic value. While quotation marks have multiple graphemes (eight, specifically) for one semantic value, dashes typically have two graphemes (a short and a long dash) that cover as many as 11 (Unicode and Latin) dash characters. While the criteria for line length has typically been highly prominent in orthography manuals, it is only found in the presented categorization on the sixth hierarchical level.
Aside from two new Unicode dash characters (the two-em dash and three-em dash, Unicode 6.1, January 2012) having been standardized in the meantime, differing methodology and a comparison of the linguistic-historical and computational linguistic aspects have spread awareness of dash characters in the Croatian language as described in Portada-Stojanov (2009). A categorization is presented that is sensitive to the dichotomy of graphic representation and meaning that divides all dash characters into five hierarchical levels. Among the 44 Unicode horizontal and unbroken dash characters, a division into type, time, functionality, direction, and line height has resulted in 11 contemporary Latin alphabetic horizontal central characters, among which each language written in the Latin alphabet chooses its own. The semantic value and usage of all Unicode dash graphemes has been described.
On the other hand, the paper also described dash characters from the perspective of Croatian historical linguistics and orthography. In comparison to the rich repository of standardized Unicode dash characters, it has been shown that orthographic standards are significantly reductive. Orthographic norming of dash characters is divided into two periods and three groups, depending on their graphemic form (the first and second generation of orthography manuals) and terminology (the pre-standard phase and the two standard norming schools, depending on the acceptance of the terminological pairs “spojnica – crtica” and “crtica – crta”).
The historical linguistic and computational linguistic comparative research and the contrastive analysis of the Unicode standardization of dash characters with traditional orthographic descriptions of dash characters was intended to highlight (i) the need for a broader, interdisciplinary approach to describing written linguistic practice, (ii) the insufficiency of descriptions in primary and secondary school orthography manuals for modern writing, and (iii) the insufficiency of the existing Croatian codification of both terminological schools. In order for orthography manuals to be called scholarly, it is claimed that computer writing should be better described, and that a differentiation between characters and graphemes should be introduced on the level of punctuation. One of the areas in which orthography manuals could bring themselves technologically up to date is the issue of the writing of compound words at the beginning of a broken line, and the paper provides eight reasons to abandon the current tradition.
Analysis has shown that it would be justified to base dash codification on three or four characters, which reduces the 11 Latin Unicode characters to basic groups of dashes – the short, medium, long, and very long dashes, referred to as c1, c2, c3 and c4.

Goranka Blagus Bartolec, Lana Hudeček, Kristian Lewis, Milica Mihaljević, Ermina Ramadanović, Matea Birtić, Jurica Budja, Barbara Kovačević, Ivana Matas Ivanković, Željko Jozić, Alen Milković, Irena Miloš, Tomislav Stojanov, Kristina Štrkalj Despot. 2013. Hrvatski pravopis [Croatian Orthography Manual]. Main editor Željko Jozić. ISBN 978-953-7967-04-8. Zagreb: Institut za hrvatski jezik i jezikoslovlje

Stojanov, Tomislav. 2013. Uvod u (X)HTML i CSS [Introduction to (X)HTML and CSS]. Skripta [Coursebook]. Priručnici Tehničkoga veleučilišta u Zagrebu / Manualia Polytechnici Studiorum Zagrabiensis. Zagreb: Tehničko veleučilište u Zagrebu. ISBN 978-953-7048-27-3, 96 pp.

Stojanov, Tomislav; Vučić, Zoran. 2012. Korpusnojezikoslovna obradba tekstova Sportskih novosti. N-gramsko modeliranje dohvaćanja podataka i vizualizacija [Corpus linguistic processing of 'Sportske novosti' texts. Information retrieval of N-gram models and visualization]. In: Filologija 59(2012). Zagreb: Hrvatska akademija znanosti i umjetnosti. ISSN 0449-363X, pp. 103–129

Portada, Tomislav; Stojanov, Tomislav. 2009. O vodoravnim crticama u hrvatskome pravopisu [On Horizontal Dashes in Croatian Orthography]. In: Filologija 52(2009). Zagreb: Hrvatska akademija znanosti i umjetnosti. ISSN 0449-363X, pp. 91–120

This paper is a review on terminology and usage of three horizontal dashes (-, – and —) in Croatian orthographies and orthographic papers. Considerable contradictions and inconsistencies have been spotted in both terminology and practical use of horizontal dashes. The new Croatian Orthography, recently published by Matrix Croatica and written by Badurina, Marković and Mićanović, contributed even more to the confusion by prescribing solutions that deviate significantly from orthographic tradition and typographic practice. Practical, orthographic, and computational linguistic arguments have been stated and elaborated against these solutions. The authors propose terms spojnica, en-crtica and em-crtica for characters -, – and —. Two possible directions in the development of orthographic rules and usage have been pointed out. The authors have also drawn attention to some other inconsistencies in orthography which should be systematized and standardized.

Ćavar, Damir; Jazbec, Ivo-Pavao; Stojanov, Tomislav. 2009. CroMo – Morphological Analysis for Standard Croatian and its Synchronic and Diachronic Dialects and Variants. In: Finite-State Methods and Natural Language Processing - Post-proceedings of the 7th International Workshop FSMNLP 2008. Jakub Piskorski, Bruce Watson, Anssi Yli-Jyrä (eds.). Italy: IOS Press. ISBN 978-1-58603-975-2, pp. 183–190

Stojanov, Tomislav; Lewis, Kristian; Portada, Tomislav. 2009. Rad na Struni na primjeru hrvatskoga kemijskoga nazivlja [Research on Struna Project in a Context of Croatian Chemical Terminology]. In: Terminologija in sodobna terminografija : zbornik [conference proceedings]. Nina Ledinek, Mojca Žagar, Marjeta Humar (eds.). Ljubljana: Založba ZRC, ZRC SAZU. ISBN: 978-961-254-158-3, pp. 181–195

Stojanov, Tomislav. 2007. Načela određivanja sintaktičkih jedinica i analiza glagolne skupine [Principles of Syntactic Units Determination and Analysis of Verbal Phrase]. In: Sintaktičke kategorije. Zbornik radova znanstvenoga skupa s međunarodnim sudjelovanjem 'Hrvatski sintaktički dani'. [Proceedings at the conference held on 11-12 May 2006]. Branko Kuna (ed.). Osijek – Zagreb: Filozofski fakultet Sveučilišta u Osijeku i Institut za hrvatski jezik i jezikoslovlje, pp. 227–239

Stojanov, Tomislav. 2006. Saussureova sintagmatika i pitanje naziva jedinica sintakse skupine [De Saussure's Syntagmatics and the Phrase-Structure Syntax Terminology Issue]. In: Filologija 46-47(2006), Zagreb: Hrvatska akademija znanosti i umjetnosti. ISSN 0449-363X, pp. 271–284

Dovedan, Zdravko; Stojanov, Tomislav; Vučković, Kristina. 2005. Syntax Analysis Directed by Transition and Action Table. In: Informacijske znanosti u procesu promjena. Jadranka Lasić-Lazić (ed.). Zagreb: Filozofski fakultet, Zavod za informacijske studije Odsjeka za informacijske znanosti, pp. 169–179

Vučković, Kristina; Ujdur, Ante; Stojanov, Tomislav; Dovedan, Zdravko. 2005. Interaktivni dječji slikovni rječnik [Interactive Children's Picture Dictionary]. In: Proceedings of the 28th International Convention MIPRO 2005: Computers in Education. Marina Čičin-Šain, Ivana Turčić Prstačić, Pavle Dragojlović (eds.). Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, pp. 55–59

Stojanov, Tomislav. 2002. Razredba zamjenica i zamjeničnih pridjeva, te pravila sintaktičke povratnosti [Pronouns and Pronominal Adjectives Classification, and Syntactic Reflexiveness Rule]. In: Suvremena lingvistika 51-52(2002), Zagreb: Hrvatsko filološko društvo. ISSN 0586-0296, pp. 227–243

Stojanov, Tomislav. 2002. Formalna morfosintaktička i sintaktička obrada rečenica hrvatskoga jezika [Formal Morphosyntactic and Syntactic Processing of Croatian Language Sentences]. In: Zbornik radova 'Težakovi dani' [Proceedings]. Slavko Tkalac, Jadranka Lasić-Lazić (eds.). Zagreb: Zavod za informacijske studije Odsjeka za informacijske znanosti. ISBN 953-175-182-X, pp. 135–147

Tepeš, Božidar; Mateljan, Vladimir; Stojanov, Tomislav; Tepeš, Tomislav; Kesić, Bogdana. 2001. Database of Grammatical Sentences of Croatian Language. In: Conference Proceedings 23rd Int. Conf. Information Technology Interfaces ITI 2001. Kalpić, Damir (ed.). Pula: Srce, 423–432

Stojanov, Tomislav. 1995. Korpus priloga na -ice u hrvatskome jeziku [Corpus of the -ice adverbs in Croatian]. Unpublished paper awarded by the Dean's award of the University of Zagreb in 1995.









