Liana Lortkipanidze

Academic Doctor of Science

Archil Eliashvili Institute of Control Systems of the Georgian Technical University

Scan QR

Automatic classification of Russian lexical units by grammar featuresLortkipanidze LarticleLtd. "Sani" / Proceedings of the institute of control systems of the Georgian Academy of sciences 2002 / N6 pp. 199-2060 ISSN 0135-0765 GeorgianState Targeted Program
Georgian corpora tagger-parserLortkipanidze Larticle"Metsniereba" / Proceedings of the institute of control systems of the Georgian Academy of sciences 2003 / N7 pp 189-1970 ISSN 0135-0765 GeorgianState Targeted Program
Record and reproduction of morphological functionsLortkipanidze Lconference proceedingsProceedings of the 5th Tbilisi Symposium on Language, Logic and Computation. ILLC, University of Amsterdam CLLS, Tbilisi State University, 2003 pp. 105-1110 ISBN 90-6776-130-0 EnglishState Targeted Program
Georgian-Russian-English Literal InterpretationLortkipanidze Lconference proceedingsPubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2004 / N8 pp. 164-1680 ISSN 0135-0765 GeorgianState Targeted Program
Morphological Processes and Morphological SingsLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Academy of sciences 2005 / N9 pp. 275-2820 ISSN 0135-0765 GeorgianState Targeted Program
Application of GeoTrans System in the Georgian “Spell Checker”Lortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2006 / N10 pp.187-1920 ISSN 0135-0765 GeorgianState Targeted Program
Computer Prompter for Georgian LanguageLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2006 / N10 pp.187-1920 ISSN 0135-0765 GeorgianState Targeted Program
Presentation of Language Morphology in the Expert SystemLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2008 / N12 pp. 175-1810 ISSN 0135-0765 GeorgianGrant Project
Three Aspects of Language ModellingLortkipanidze L, N. Javashvili, G. Chikoidze, E. DokvadzearticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2008 / N12 pp. 149-1600 ISSN 0135-0765 EnglishGrant Project
Recognition of Language Morphological Characteristics’ PatternsLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2008 / N12 pp. 169-1750 ISSN 0135-0765 GeorgianGrant Project
Modeling of derivation in the Multilingual Expert SystemsLortkipanidze LLortkipanidze L., Amirezashvili N., Samsonadze L.articlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2009 / N13 pp. 154-1610 ISSN 0135-0765 EnglishState Targeted Program
The Computer Realization of Applied Dictionaries of “Computer Prompter” for Georgian LanguageLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2009 / N13 pp. 154-1610 ISSN 0135-0765 GeorgianGrant Project
Realization of Structure of Morphological Zone of Computer-Explanatory DictionaryLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2009 / N13 pp. 162-1660 ISSN 0135-0765 GeorgianGrant Project
Interactive System for Compilation of Multilingual Concordancers DictionaryLortkipanidze LarticlePubl. House "Inteleqti" / Proceedings of the LEPL Archil Eliashvili institute of control systems 2010 / N14 pp. 188-192.0 ISSN 0135-0765 GeorgianGrant Project
Developing of the Manager of Georgian literary text corpusLortkipanidze L., Eremian Rconference proceedingsPapers of the International Scientific Conference “Contrastive Research and Applied Linguistics”. Minsk, 2014 Juny0 ISBN 978-985-460-669-9 RussianGrant Project
The model of the automatic removal of homonymy in the corporaLortkipanidze LarticleLtd "Damani" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2014 / N 18, p. 187 - 1930 ISSN 0135-0765 GeorgianGrant Project
Compiuter Linguistics and Language ModelingL. Lortkipanidze, N. Javashvili, G. ChikoidzearticleLtd "Damani" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2014 / N 18, p. 43-500 ISSN 0135-0765 GeorgianGrant Project
Georgian Corpus of Linguistic Meta-Language: Problems and ProspectsLortkipanidze Lconference proceedingsPapers of the International Scientific Conference “Contrastive Research and Applied Linguistics”. Belarus, Minsk0 ISBN 978-985-460-669-9 RussianGrant Project
On Multicriteria Algorithm for Specific Problem of Scheduling TheoryLortkipanidze LarticleGESJ: Computer Science and Telecommunications 2014|No.3(43)0 ISSN 1512-1232 EnglishState Targeted Program
The Georgian Dialect Corpus: Problems and Prospects.Lortkipanidze L., Beridze M., Nadaraia D.conference proceedingsNarr Francke Attempto Verlag GmbH & Co. KG • editorial department Tillmann Bub Dischinger Weg 5, 72070 Tübingen, Jost Gippert / Ralf Gehrke (eds.) (= CLIP, Vol. 5), 2015 pp. 323 - 334 0 ISBN 978-3-8233-6922-6 EnglishGrant Project
Dialect Dictionaries in the Georgian Dialect CorpusLortkipanidze L., Beridze M., Nadaraia D.conference proceedingsTheoretical Computer Science and General Issues. 10th International Tbilisi Symposium on Logic, Language, and Computation, TbiLLC 2013, Revised Selected Papers. Publisher: Springer-Verlag Berlin Heidelberg, 2015. pp. 82 - 96Q4 ISBN 978-9941-20-575-0 EnglishGrant Project
WordNet Thesaurus Technology StandardsLortkipanidze L., Javashvili N. conference proceedingsPublishing house "techikuri universiteti" / Proceedings of the International Scientific Conference "Information and Computer Technologies. Modelling, Control" Dedicated to 85 th anniversary of Academician I.V. Prangishvili, 2015 / pp. 441-4440 ISBN 978-9941-20-575-0 GeorgianGrant Project
Algorithmization of a vector space model for textual information processingLortkipanidze L.conference proceedingsPublishing house "techikuri universiteti" / Proceedings of the International Scientific Conference "Information and Computer Technologies. Modelling, Control" Dedicated to 85 th anniversary of Academician I.V. Prangishvili, 2015 / pp. 441-4440 ISBN 978-9941-20-575-0 GeorgianGrant Project
Vector space model and Georgian text processingLortkipanidze LarticlePubl. house "Universali" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2015 / N 19, p.. 105 - 1080 ISSN 0135-0765 GeorgianGrant Project
Lexical Functions as an Important Component of Combinatorial DictionaryChikoidze G., Amirezashvili N., Lortkipanidze L., Samsonadze L., Chutkerashvili A., Javashvili N.articlePubl. house "Universali" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2015 / N 19, p. 98 - 1040 ISSN 0135-0765 GeorgianState Targeted Program
The Algorithm and Program Realization of the Automatic Formation of Hyponymy Tree in Accordance with the Structure of Thesaurus WordNetChikoidze G., Lortkipanidze L.articleLtd "damani" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2016 / N 20, p. 19 - 240 ISSN 0135-0765 GeorgianGrant Project
The Lexical Ontology of GeWordNetLortkipanidze L., Gegechkori MarticleLtd "damani" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2016 / N 20, p. 148 - 1520 ISSN 0135-0765 GeorgianGrant Project
Syntax Annotation of the Georgian Literary CorpusLortkipanidze L., Amirezashvili N., Chutkerashvili A., Javashvili N., Samsonadze L. conference proceedingsSpringer/ Logic, Language and Computation, 11th International Tbilisi Symposium, TbiLLC 2015, Revised Selected Papers 2017 / LNCS 101148, pp 89-97Q4 ISSN 0302-9743/ E-ISSN 1611-3349 /ISBN 978-3-662-54331-3 / ISBN 978-3-662-54332-0 (e-book) DOI 10.10007/978-3-662-54332-0EnglishGrant Project
Morphological Analyser of Kartvelian LanguagesLortkipanidze LarticleLtd Poligrapi / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2017 / N 21, p. 108‑1110 ISSN 0135-0765 GeorgianState Targeted Program
The Georgian Text CorporaLortkipanidze L.., Kloyan L., Kloyan M.articleLtd Poligrapi / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2017 / N 21, p. 112 - 1160 ISSN 0135-0765 GeorgianState Targeted Program
Bidirectional Georgian-English automatic translation of derivative formsLortkipanidze L., Javashvili N., Chutkerashvili A., Aidarashvili G. articleLtd Poligrapi / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2018 / N 22, p. 127 - 1320 ISSN 0135-0765 RussianState Targeted Program
Morphological Analyzer of Georgian Language’s SubsystemsL. Lortkipanidze, L. MakrakhidzearticlePubl. house "Macne-printi" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2019 / N 23, pp. 115-1180 ISSN 0135-0765 EnglishState Targeted Program
Combinatorial Dictionary of Georgian LanguageLortkipanidze LarticlePubl. house "Sachino" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2020 / №24 pp. 98-1040 ISSN 0135-0765 GeorgianState Targeted Program
Using the GeWordNet thesaurus in Georgian dialogue systemLortkipanidze LarticlePubl. house "Sachino" / Proceedings of the Archil Eliashvili institute of control systems of the Georgian Technical University, 2020 / №24, PP. 90-970 ISSN 0135-0765 GeorgianState Targeted Program
Linguistic Knowledge Base for Georgian LanguageLortkipanidze LarticlePubl. house 0 ISSN 0135-0765 GeorgianState Targeted Program

Arnold Chikobava Readings XXVITbilisi, Georgia201528 აპრილი-1 მაისიTSU Arnold Chikobava Institute of LinguisticsFor the principles of automatic and semi-automatic morphological annotation in the Georgian dialect corpusoral

The current stage of creating a Georgian dialect corpus involves solving conceptual and practical issues of morphological annotation and overcoming grammatical homonymy. For the morphological analysis of the CDC we use the system GeoTrans, which processes the common word of the corpus. The common word for corpus includes two different elements: texture and lexicon. Textual data is a word form presented in all contexts, while the lexical-main form of the dictionary (Lema), its phonetic, grammatical and word-production variations. Accordingly, the material presented in the speech of the second group is accompanied by grammatical information: the lemma is marked with the marker of the first hierarchy (marker of the grammatical group), and on the grammatical and word production variations it is written with both descriptive and paradigm markers. . The method of joint processing of words of textual origin and lexical data allows the identification of dialectal forms. According to the concept of CDC, morphological analysis implies a consistent unity of automatic, semi-automatic and manual annotation processes. The report describes the peculiarities of the technological system of morphological analysis, presents the results of the primary automatic analysis, discusses the testing of automatically annotated non-homonymous and homonymous grammatical word lists and the final assignment of markers in the body (contexts). It should be noted that it is likely that approximately 20-30 percent of the entire corpus textual material will be annotated as a result of the initial automated analysis. The analysis performed at this stage for five dialects is currently being tested, the homonym being removed manually, and the results of the correct analysis being automatically reflected in the contexts. The frequency analysis experiment was discussed separately, which involved separating and processing a list of words realized in more than 1000 contexts.

https://ice.ge/of/?page_id=254
Arnold Chikobava Readings XXVITbilisi, Georgia201528 აპრილი-1 მაისიTSU Arnold Chikobava Institute of LinguisticsGeorgian Grammar Online Dictionaryoral

At the present stage of technology development, many intelligent systems have been created and implemented through computers. Among them are computer language systems, the deep theoretical basis of which is language modeling, ie artificial systems that replicate the basic aspects of language behavior: language knowledge, the use of knowledge for the analysis-synthesis of expressions, and the acquisition of knowledge. The presented system is focused on the practical realization of knowledge acquisition. Vocabulary computerization has been used as a starting point for building a computer system for teaching language. While ordinary "book" dictionaries are very valuable in solving the problem, they have two serious drawbacks: lack of information and "passivity". It should be noted that usually in the dictionary each lexical item is marked with only the only initial word form (lemma) of its paradigm, from which it is difficult to present a complete paradigm, especially for the Georgian language. Developed countries have long since started working on online grammar dictionaries and they are available for almost all international languages (e.g. Russian - http://www.morfologija.ru/словоформа/олень, German _ http: // www.canoo.net/services/Controller?input=mami&service=inflection). These types of systems are very popular with all walks of life. In the Internet space, next to bilingual dictionaries, systems for teaching foreign languages occupy an important place. Grammar dictionary is a basic element for computer language learning systems. Currently, many groups are working on its creation in Georgia and abroad. But today not a single product has appeared on the Internet, which basically included the Georgian language vocabulary and its morphological generator. Our products will be the first interactive online program for learning the Georgian language, which will provide the language learner with fundamental knowledge about the vocabulary and grammatical variations of words. Currently, a package of software tools is being created, which will help users to analyze and synthesize Georgian word forms in the Internet space at the level of both form production and word production. The online dictionary for any word provides a search for its relevant lexical base and represents all members of the relevant paradigm. The dictionary already contains 100,000 initial words and all the rules of their form production. Created by us, and not tested for a single task, the GeoTran system allows us to dynamically increase the number of initial words in a dictionary at any given time, theoretically indefinitely, which is why so much attention is paid to systems like foreign languages.

https://ice.ge/of/?page_id=254
International Scientific Conference "Information and Computer Technologies. Modelling, Control" Dedicated to 85 th anniversary of Academician I.V. PrangishviliTbilisi, Georgia20153-5 ნოემბერიGeorgian Technical University; Georgian Engineering Academy; International Engineering AcademyAlgorithmization of a vector space model for textual information processingoral

The report describes the main stages of the formation of semantic vectors of language. The method of forming a multidimensional vector representing the semantic proximity of words is discussed. A general overview of generalized models of vector space is given. A general scheme of text model vector processing algorithm and software support is established.

http://ict-mc.gtu.ge/conference.pdf
International Scientific Conference "Information and Computer Technologies. Modelling, Control" Dedicated to 85 th anniversary of Academician I.V. PrangishviliTbilisi, Georgia20153-5 ნოემბერიGeorgian Technical University; Georgian Engineering Academy; International Engineering AcademyWordNet Thesaurus Technology Standardsoral

The report describes the methodology of developing Georgian WordNet Thesaurus - GeWordNet. Explains the difference between traditional dictionaries and thesaurus compared to the WordNet thesaurus. Lists the basic principles used in Princeton's WordNet Thesaurus. Groups of linguistic sources necessary to present information about the language system are discussed. Characterized by WordNet Thesaurus Development Standards: Definitive, contextual, and word-production methods for meaning analysis. The types of semantic, paradigmatic, and syntagmatic connections used in Thesaurus are described.

http://ict-mc.gtu.ge/conference.pdf
HUMANITIES IN THE INFORMATION SOCIETY-2Batumi, Georgia201424-26 ოქტომბერიBatumi Shota Rustaveli State University, The Faculty of the HumanitiesSyntactic Analyzer of Georgian Sentenceoral

The automatic syntactic analyzer of Georgian sentence is considered. The program is designed for automatic syntactic marking (tagging) of Georgian texts. It involves syntactic model of the Georgian language, as well as the morphological level. The text corpus is transferred to the program input. The user receives a text divided into sentences in the output, where each word-form has its initial form and grammatical characteristics. The syntactic characteristics of a word-form are conditioned by the relations the word form is connected to the other members of a sentence. As for the syntactic description we use the phrase structure tree and syntactic role structure descriptions.

The tree of the Syntactic analysis is presented by the binary mutual oriented connections between words. Each connection contains a parent and a successor word. In order to get syntactic trees as outcome, the rules of syntactic tree structuring reproduced.

On the basis of these rules and also having taken into consideration the rules of syntactic relations of Georgian language we compiled a table of descriptions of all possible connections and of mutual oriented syntactic role structures. The table can be called "the Georgian Language Syntactic Role Structures Dictionary" (note GLSRSV).

A dictionary unit includes the following: the marker (abbreviation) of the name the marker (abbreviation) of the name of the syntactic role of the successor word in the considered syntactic structure the syntactic role marker of the corresponding successor word, the marker of the corresponding parent word, the morphological characteristics of successor word, etc. The syntactic annotation system of text corpus consists of several modules: a graphometrical analyzer, a morphological analyzer, GLSRSV Dictionary, an approximate constructor of syntactic trees. The principles of the program, which are based on the algorithm of the intercommunication of these modules and the examples of the automatic analysis of sentences, will be considered.

http://www.nplg.gov.ge/ec/ka/bibl/catalog.html?pft=biblio&from=3591&rnum=10&udc=811.353.1
HUMANITIES IN THE INFORMATION SOCIETY-2Batumi, Georgia201424-26 ოქტომბერიBatumi Shota Rustaveli State University, The Faculty of the HumanitiesThe syntactic structure of a georgian sentenceoral

The syntactic structure of a Georgian sentence will be considered in the paper by the binary relation of the linguistic structure, where the role of each word in the word connection will be indicated. Syntactic relations between words in a sentence correspond to the syntactic tree structure. The members of the sentence (the words) are presented as elements of the noun phrase(s) (NP) and verb phrase(s) (VP). In order to maintain the integrity of the tree structure the concept of zero-node (S-Sentence) is used, which is a parent of the VP verb phrase in case of an impersonal verb; in other cases it is a parent of both NP and VP phrases. All members of the sentence both main and the secondary are described. The syntactic role of each is necessarily indicated: it is the syntactic role of a parent and a successor. In a parent role may be both a noun phrase and a verb phrase. These phrases can be involved in the capacity as a successor. It is also shown in which syntactic construction is one or the other word involved, also the all possible roles and corresponding grammatical features are ascribed to each of them. For example, a direct object is the successor to the VP verb phrase (VP = V + N). It can be 202 expressed by noun, adjective, numeral, pronoun, and verbal noun in singular and plural. Its cases are nominative and dative. It can been closed bi-prepositions (-vit, -tan, -ze, -ši) and particles (-a, -γa, -ve), as well as by indirect speech particles (o, metki, tko). The knowledge accumulated while the morphological analysis plays an important role in the syntactic annotation structure of a Georgian sentence. It presents comprehensive syntactic information. For example, a noun given by the ergative case may be only a subject, etc. The syntactic annotation system with such structure and its grammatical characteristics allows a perfect description of any Georgian sentence.

http://www.nplg.gov.ge/ec/ka/bibl/search.html?cmd=search&pft=biblio&qs=700%3A1%3A%E1%83%90%E1%83%9B%E1%83%98%E1%83%A0%E1%83%94%E1%83%96%E1%83%90%E1%83%A8%E1%83%95%E1%83%98%E1%83%9A%E1%83%98+%E1%83%90
The Second Scientific Conference in Exact and Natural ScienceTbilisi, Georgia201429 იანვარი-3 თებერვალიIvane Javakhishvili Tbilisi State UniversityMorphological analyzer of Georgian language and its subsystems as a main component of a text corpus manageroral

The main purpose of linguistic text corpora is to provide scientific research in language vocabulary and grammar. Corporate annotation can provide any type of analytical information about the text. To compile a corpus research tool - corpus manager it is necessary to morphologically mark (annotate) the texts included in it. Which is especially difficult in the case of different subsystems of the Georgian language. The report deals with the development of a morphological analyzer of Georgian language subsystems. It is implied that each text unit, the analysis of which gives a negative result according to the morphological dictionary of the modern Georgian language, belongs to the language subsystem (dialect). Accordingly, the method of filling / enriching the morphological dictionaries of different dialects has been developed. The procedure for compiling a dictionary for a certain subsystem of a language consists of four stages: 1. completing the dictionary of lemmas (basic forms) with the help of existing dialect dictionaries (if any); 2. Morphological annotation based on literary and dialect dictionaries; 3. The clustering of all unidentified word forms into which are then attributed and attributed hypothetical information about the grammatical part of speech, the lemma, and other characteristics based on the lexeme pattern; 4. Evaluate the most correct hypotheses and add new lemmas and formative production rules to the morphological analyzer dictionary of a given dialect.

http://conference.ens-2014.tsu.ge/page/program/11
Contrastive Studies and Applied LinguisticsMinsk, Belarus201429-30 ოქტომბერიMinsk State Linguistic UniversityLinguistic Meta-Language Corpus of Georgian: Challenges and Solutionsoral

To ensure the representativeness of the national corpus, it is necessary to display in it a certain segment of the metalanguage of various scientific disciplines. The metalanguage of science is a significant fragment of the language. The article discusses the creation of the Georgian corpus of the metalanguage of linguistics based on the electronic library of the works of Vissarion Arkadyevich Dzhorbenadze (1942-1992), one of the prominent Georgian linguists of the 20th century.

The system will be created as a WEB application, hosted on a server and available to any authorized user via the Internet. Our multi-component product will have both scientific and educational function. The report discusses ways to solve the tasks set, which are presented in the following form:

1. Creation of a text electronic bank

2. Technical support

3. Creation of the working interface of the corpus and reader.

https://elib.grsu.by/katalog/497344pdf.pdf?d=true
Contrastive Studies and Applied LinguisticsMinsk, Belarus201429-30 ოქტომბერიMinsk State Linguistic UniversityLinguistic Meta-Language Corpus of Georgian: Challenges and Solutionsoral

To ensure the representativeness of the national corpus, it is necessary to display in it a certain segment of the metalanguage of various scientific disciplines. The metalanguage of science is a significant fragment of the language. The article discusses the creation of the Georgian corpus of the metalanguage of linguistics based on the electronic library of the works of Vissarion Arkadyevich Dzhorbenadze (1942-1992), one of the prominent Georgian linguists of the 20th century.

The system will be created as a WEB application, hosted on a server and available to any authorized user via the Internet. Our multi-component product will have both scientific and educational function. The report discusses ways to solve the tasks set, which are presented in the following form:

1. Creation of a text electronic bank

2. Technical support

3. Creation of the working interface of the corpus and reader.

https://elib.grsu.by/katalog/497344pdf.pdf?d=true
7th Biennial IVACS ConferenceNewcastle, United Kingdom201419-21 ივნისიNewcastle University Towards Creating a Large Corpus for Georgianoral

There is no large representative corpus for the Georgian language, which is the official language of Georgia and belongs to Kartvelian family. In this joint project between the Tbilisi State University and the University of Leeds, we build KaWac, which designed to be a large and diverse Georgian Corpus from the Internet. The process started with identification of the more popular resources (over 1000 links) and crawling from them using wget, with further processing by webpage cleaning and deduplication based on BootCat tools. We estimate a corpus of 150 million words, 200,000 webpages.

By selecting the diverse initial links, we are trying to ensure that KaWaC covers a wide range of text types, topics and regions. The text types are described using Functional Genre Dimensions, such as Argumentative, Instructional, Legalistic, etc. The corpus will be morphologically annotated and lemmatised using the morphological analyzer GeoTrans developed by Georgian computational linguist at Tbilisi State University.

Challenges in processing: Highly inflected morphological variation, mainly in verbs, adjectives and nouns (e.g. seven cases, three series of verbs divided into ten classes etc.), fragments of texts in other languages (English, Russian), barbarisms in informal language, namely, in texts from personal blogs and forums.

KaWac is developed to be the primary resource for corpus based lexicography for Georgian. It will be also used to promote creation of grammar and other language teaching materials. KaWac will be a great resource for linguists to study the regional varieties, as well as both formal, planned and spontaneous, unplanned language.

https://10times.com/ivacs
Tbilisi, Georgia201322-26 იანვარიIvane Javakhishvili Tbilisi State UniversityCompiler of Finite-State Automaton for the Morphological Processor of the Georgian Languageoral

Complete dictionaries of natural languages do not exist. It is also impossible to count an infinite set of numbers or to list all existing proper names. All languages undergo changes through the course of time.Their lexicons do too. All subsystems of a language have their expressions and a dictionary of a language can in no way include the complete lexicon of all of its individual varieties. Georgian has about seventeen dialects. Activities for creating the Georgian Dialect Corpus (GDC) are under way. A part of the technological procedures of the corpus activities is a case in point in the present paper.

We will dwell upon the compilation of a dialect morphological processor.

We will present a rather simple and, simultaneously, perfect technique which, by way of adapting of the already existing processor of a standard language, enables to compile a dialect processor. By means of the tools of our software, the system is trained for various dialects by applying known morpho-phonemic rules.

In order to verify the method, chose a corpus of Georgian dialects. Based on the morphological pattern of Standard Georgian, We adapted the morphological processor and afterwards attempted to lemmatize and surface annotate the dialect corpus.

Paragraph 2 of the present paper will discuss the system of the compilation of the morphological processor with respect to Georgian, paragraph 3 will deal with the technique of the adaptation of the standard language processor for dialect varieties, paragraph 4 will description Morphological Analysis and paragraph 5 will address the related work.

http://conference.ens-2013.tsu.ge/page/program/11
International Conference "Georgian Language and Modern Technologies" IIITbilisi, Georgia20134-5 სექტემბერიTSU Arnold Chikobava Institute of LinguisticsA New Lexicographic Editor of the Georgian Dialect Corpus oral

The initial stage of the creation of the Georgian Dialect Corpus (GNC) comprised the development of the basic text body, the system of meta-textual annotation, and the reference system based on it. The grammatical markup and a lexicographic component were considered only on a conceptual level. The current stage of the project envisages the morphological annotation of the corpus and the formation of its lexicographic base. Therefore, the lexicographic part of the conception was revised.

The new lexicographic editor of the Georgian dialect corpus incorporates all peculiarities of „paper dictionaries“. Besides, it is a flexible and effective means for equipping older dictionaries with significant additional information and for the creation of a new one.

By means of the new editor of dictionaries it is possible: 1. to ascribe a grammatical class property to a headword; 2. to describe phonetic variation, this providing an opportunity for the automated tagging of variants in the corpus; 3. to develop information about the grammatical (inflectional) variation of a lexeme; 4. to create dictionaries of collocations and idioms, enabling to tag such entities in the corpus, and, thus, to search; 5. to create dictionaries of dialectal uninflected words and of their numerous phonetic variants, to be also used for the automated annotation in the corpus.   

It has been principal for us to make the grammatical standard in the GNC maximally consider the traditional linguistic thought. A list of grammatical properties and their abbreviations were developed, being in accordance with the most acknowledged standards (EAGLES, Leipzig Glossing Rules…); however, this system is not totally based upon them and strives to the complete reflection of the peculiarities of the Georgian language and of the Georgian linguistic tradition. 

The new editor stores an entry as a database, being associated with a set of various properties as individually configured lookup lists.

A headword is associated with all existing dialect dictionaries and text base as well. This implies that all entries, evidencing a word as a headform and all contexts, including the word in a text base, are integrated within a single concordance. 

Similarly to a headword, a field of grammatical variation is also associated with lists of grammatical properties. Alongside with the grammatical variations, 

Hence, an entry, developed within the new lexicographic editor of the GNC, comprises enough linguistic information to be applied as one of the means of the grammatical tagging in the corpus.  

The new lexicographic conception will also be an effective means for the development of corpusbased and corpus-driven dictionaries.  

https://ice.ge/of/wp-content/uploads/symp_2013_3/masalebi.pdf
Georgian language - challenges of the 21st centuryKutaisi, Georgia20138 ივლისიThe parliament of GeorgiaComputer Linguistics at the Institute of Control Systemsoral

Fundamental scientific research in the field of computer linguistics has been carried out at the Archil Eliashvili Institute of Control Systems of the Georgian Technical University since the end of the 50s of the last century.

The report discussed the connections and cooperation of the Department of Language and Speech Modeling with the linguistic schools of different countries. The algorithms developed in the section and their software were listed.

Local and international projects implemented in the department and software products implemented within them were presented.

https://parliament.ge/print/news/sakartvelos-parlamentshi-konferentsia-kartuli-ena-21-e-saukunis-gamotsvevebi-mimdinareobs
10th International Tbilisi Symposium on Language, Logic and ComputationGudauri, Georgia201323-27 სექტემბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.Dialect Dictionaries with the Functions of Representativeness and Morphological Annotation in Georgian Dialect Corpusoral

The Georgian Dialect Corpus (http://mygeorgia.ge/gdc) is being developed as an instrument for the study and documentation of the geo-graphical varieties of Georgian. The strategy for the development of the GDC was based on one hand, on the international corpus experience, and on the traditions of Georgian dialectology and dialectography, on the other hand. In the corpus designing process we did our best to take into account the Georgian national linguistic and cultural space peculiarities.      

In the Georgian Dialect Corpus, dictionaries are applied to accomplish two goals: to achieve representativeness and for morphological annotation. The present paper gives the detailed description how the above mentioned functions are realized.

New texts are continuously being added to the corpus, and at the same time, the morphological annotation of the data is under processing; therefore, so far, the corpus can only be queried according to the following meta-textual (non-linguistic) features:

• Language and dialect

• Place of recording

• The informant’s identity

• Thematic and chronological features of a text

• Text type (narrative, poetry, conversation…)

  The structure of the corpus has been entirely determined by the fact that its technological chain comprises the whole cycle of text processing, beginning from data recording till their integration in the text base of the corpus. Hence, when the planning of field activities outline the occurrence of such components of the corpus as a block of administrative units, information blocks of chronological, thematic, sociologic, etc. features.

In order to facilitate the morphological annotation of the corpus, we presented the dialect dictionaries as “partially grammatical” dictionaries and applied them in the lemmatization and linguistic annotation processes. We decided to use the data of Georgian dialect lexicography in order to increase the lexical database (textual base) of the corpus as well.

https://archive.illc.uva.nl/Tbilisi/Tbilisi2013/
International conference “Historical Corpora 2012”Frankfurt, Germany20126-9 დეკემბერიGoethe-Universität Frankfurt am MainThe Georgian Dialect Corpus: Problems and Prospectsoral

The Georgian Dialect Corpus is a part of a comprehensive project Linguistic Portrait of Georgia. The team started working on the project in the late past century and, initially, it was primarily aimed at large-scale computer documenting of Georgia’s linguistic diversity. For the sake of implementing the project we chose the most effective strategy for language documentation – the corpus strategy.

The Georgian Dialect Corpus is created as a significant segment of the Georgian national communicative pattern. It is conceived as a sub-corpus of “a comprehensive Georgian corpus” and is designed for a wide interdisciplinary use.

Presently, two directions are identified in the corpus representation of dialect data: one is aimed at creating a fragmental corpus of a general character, being mostly illustrational and designed for making an impression about a diversity of language subsystems rather than for providing complete linguistic knowledge. Such an approach has been sustained in the Russian National Corpus. There is a completely different approach, in accordance with which dialect data should become a scholarly source of a new type to represent and study not only language but also a linguistic communicative pattern. Corpora of the former type have a function of illustrating and popularizing while others incorporate many other functions, among them,

In the corpus, each dialect (or any other language subsystem) is presented as an individual sub- corpus. This provides an opportunity to discuss linguistic phenomena and/or cultural artifacts both within the integral cultural field and within an individual communicative space or a regional cultural area.

Presently, the corpus incorporates texts from all Georgian dialects (among them, data of the dialects, spread in Iran, Turkey, and Azerbaijan); intensive activities for corpus processing of the Laz text collection are under way.

https://books.google.ge/books/about/Historical_Corpora.html?id=NT4kDwAAQBAJ&redir_esc=y
Batumi II International Symposium in LexicographyBatumi, Georgia201218-20 მაისიFaculty of Education and Sciences, Batumi Shota Rustaveli State University; Arnold Chikobava Institute of Linguistics at Tbilisi State University; Lexicographic Centre, Iv. Javakhishvili Tbilisi State University.Dictionary and corpus (Lexicographic component of the Georgian dialect corpus)oral

The most authoritative dictionaries of modern times are based on buildings. There is also "feedback" - lexicographical and grammatical thinking is reflected in the corpus when developing the corpus's linguistic annotation system. The lexicographic component is not considered as part of the textual database when compiling modern buildings. As soon as we started working on the "Corpus of Georgian Dialects", we planned to integrate dialect dictionaries into it as text data. The dictionary editor includes two types of lexicographic information: existing dialect dictionaries and lexical material based on corpus concordance. Accordingly, the lexicographic element in the body is represented by two functions: as a product and as a tool. A dictionary, as a product, is created from a collection of texts and other dictionaries based on verbal material and is a new lexicographical source, and part of the work done to create it becomes a tool for the primary morphological and semantic annotation of the corpus. The report will present the new levels of the Georgian dialect corpus - lematization and dictionary editors. These two levels are directly related to the final stage of the corpus work - the morphological annotation. The concept of morphological annotation of the dialect corpus is focused on the use of the morphological processor of the Georgian language - equipping it with additional "morphological knowledge" and, consequently, the possibility of semi-automatic identification of dialectal word forms (and on this basis - lamatization, superficial and deep annotation). The unit of reference for the lexicographical level of the corpus is the main form. The primary grammatical and semantic information assigned to it allows for superficial morphological marking of the corpus. The report will present the editors of corpus lamatization and dictionaries and describe in detail their place in the morphological annotation process.

https://bsu.edu.ge/text_files/ge_file_2290_1.pdf
Batumi II International Symposium in LexicographyBatumi, Georgia201218-20 მაისიThe Generator of Explanatory Combinatorial Dictionary of Georgian Languageoral

Language modeling is one of the most important areas of the modern linguistics. It has been characterized by disintegration of language system into levels on the one hand (morphological, syntactical and semantic), and by the organization of the relationship between these levels on the other hand. It is possible to base the model on the “Explanatory Combinatorial Dictionary” for the coordinated functioning of the different levels and for their effective action. The whole information around the lexeme in the Computational Explanatory Combinatorial Dictionary of Georgian Language is divided into the zones: the first one – title word, the second – explanation of the word, the third – morphological model, the fourth – semantic-syntactical model of the word, the fifth zone is devoted to the lexical function list of the word. The project “Automatic explanatory-combinatory dictionary as a basis of Georgian language modeling” has been developed by the support of Rustaveli National Scientific Foundation. We use the modern approach for Georgian language syntax and semantic to create this system. This means: “Layered” syntactic structure; the theory of lexical parameters [I. Melchuk]; the method of synonymic series [I. Apresjan]. According to Kartvelology this is the widening, intensification and renewing of the description method of Georgian language corresponding to the new international standards; for the modern theories, mentioned above, - retesting their croslinguistic serviceability (fitness); According to the computer linguistic this work provides a powerful base to create a perfect functional model of Georgian language. 

https://bsu.edu.ge/text_files/ge_file_2290_1.pdf
Corpus Linguistic - 2011Sankt-Peterburg, Russia201127-29 ივნისიSankt-Peterburg State UniversitetSoftware tools for morphological corpus annotationoral

Morphological text markup is an important aspect when creating a language corpus. Over the past years, the Department of Language and Speech Systems of the Institute of Control Systems of the State Technical University has been developing a multilingual morphological processor for its subsequent use in a wide class of theoretical and applied problems. The possibility of semi-automatic morphological annotation of the corpus attaches significant value to the processor.

Based on the algorithm of the morphological analyzer, the department has created a package of software tools - GeoTools. With their help, the user can end up with a deeply annotated corpus. In addition, with the help of software utilities it is possible to align and process parallel texts. The software product has the ability to install the interface and data processing for three languages ​​- Georgian, Russian, English. In this case, the user can set: markup level (superficial or deep); description of morphological characteristics with appropriate markers. For those word forms that are not described in the corpus, the algorithm generates its hypothetical inflection model - one or more. It is also possible to identify the paradigm of one lexeme; recording and reproduction of the paradigm of a given lexeme; recording and searching for all lemmas with an identical paradigm. The user will be able to sort words and lemmas, both in normal and inverted order of letters (in a word form); filtering data by the same features and much more. At the moment, components for the syntactic markup of text are being introduced into the system.

Until now, there were no annotated corpora of texts for the Georgian language at all. Using a system of software tools - GeoTools in our department, in a semi-automatic mode, parallel texts of the work of a modern Georgian writer - Chabua Amirejibi - Data Tutashkhia are processed. Also, together with the Department of Computer Processing of Linguistic Data of the Institute of Linguistics. A. Chikobava we produce works for the annotation of the dialect corpus.

 

The report described GeoTools tools, stages and data entry procedures in word processing. Also, the basic principles of algorithms for aligning and marking the corpus were outlined.

https://www.ozon.ru/context/detail/id/138917959/
Georgian Language and Modern TechnologiesTbilisi, Georgia20117-8 ივლისიTSU Arnold Chikobava Institute of LinguisticsDialect Dictionaries in the Corpus (GDC) and the Issues of Semi-automatic Lemmatisationoral

The Corpus of Georgian Dialects is compiled with the assistance of Shota Rustaveli National Science Foundation and implies the creation of a vast Corpora of subsystems of the Georgian language equipped with the apparatus of linguistic and metalinguistic annotation . 

In the concept of the Corpus of Georgian Dialects (GDC) one significant issue which is new to the international experience of Corpus linguistics is the integration of dictionaries into the corpus as a textual component.

In the corpus, along with the editor of the text insertion, the editor of the dictionary addition is envisaged, which enables us to reveal all of the above discussed lexicographic characteristics of the GDC. 

Obviously, inclusion of the dictionary (or dictionary materials of various kinds) into the Corpus is an effective way of increasing its representativeness. This is particularly true for the Corpus of Dialects, as thematic, genre or stylistic “ balancing” of dialect texts is far more problematic than those of literary texts. 

Currently we are in the process of elaborating the system of morphological annotation in the Corpus (GDC). In this respect, the first step is lemmatisation. If lemmatisation is an easy, even a trivial issue in the Corpora of literary languages possessing exshaustive morphological description, in the dialectal corpora, as a rule, lemmatisation is done manually. 

The process of lemmatisation in our Corpus is oriented on the literary form- as, in this process, “equalisation” of dialectal and literary lemmas takes place. Naturally, before this stage is reached, the process of lemmatisation of the dialectal text itself should be finished. Simultaneously, the material should also be annotated according to the part of speech category. 

We decided to use the “left”- hand side of the dialectal dictionary for both processes, the partial lemmatisation and annotation according to the part-of-speech category.

Arguably the idea of including the dictionary as one of the tools in the process of lemmatisation and primary annotation can be used effectively while building the linguistic corpora of the languages the computational processing of which have not been completed.  

http://www.ice.ge/symposium/symp2011_2/konferencia-2011.pdf
International Conference "Georgian Language and Modern Technologies"Tbilisi, Georgia20117-8 ივლისიTSU Arnold Chikobava Institute of LinguisticsSome Questions of the Formation of the Plural in the Georgian Morphological Processororal

The article discusses the peculiarities of declination of some georgian language nouns in plural. Linguistic events are subject to certain regularities, but besides the general rules there are exceptions. The focus is on the forming issues of some noun plural forms. According to the already established rule, the nouns of a certain group do not use plural forms. They are called uncountable nouns. Such are the nouns of substances, abstract, collective, but it is not uncommon to use such nouns in the plural during different semantic loads. Some adjectives are turned into nouns in plural forms (reds, greens, rich, poor, etc.). In the article for ilustration a lot of word combination – collocation are presented which have already been well-established in the Georgian language, such as „ქართული ღვინოები“ - Georgian wines, „მარილების დაგროვება“- salt accumulation, „მინარალური წყლები“-mineral waters - (nouns of substances); "ფიქრები”-Thoughts, „მოტივები“-Motives, „არჩევნები“-Elections - (abstract nouns); „გუნდები“-Teams, „კრებები“-Congregations - (collective nouns): and others.

http://www.ice.ge/symposium/symp2011_2/konferencia-2011.pdf
9th International Tbilisi Symposium on Logic, Language, and ComputationKutaisi, Georgia201126-30 სექტემბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.The issue of Morphological Annotation of the Georgian Dialect Corpusoral

The Georgian dialect corpus is created at the Arn. Chikobava Institute of Linguistics. Its purpose is to portray the texts’ collection of all Georgian dialects by the corpus technology. At present, there are narrative and lexicographic data of 16 Georgian dialects placed in the corpus. However, potentially the corpus can present the data of other Kartvelian languages as well.

Morphological annotation of the corpus is one of the most important stages of the work, being now carried out by the working group. Here in this article, the five-level process of morphological annotation developing is discussed and the mechanism of forming and expanding the bases containing the relevant linguistic information is presented.

The Georgian Dialect Corpus (GDC _ http://mygeorgia.ge/gdc ) is the first corpus work in Georgian language. Its test version has become available for the internet users in the last two years and it has appropriate professional responses. Several researchers interested in the Georgian dialects and generally in the Georgian language, have been already working on the bases of the GDC.

We have considered preparation and filling up of the texts’ collection and development of the logical architecture of the corpus, as the priority among the several components. It is implemented the unprecedented quantity of work. As a result, the corpus is already providing the rapid search and the effective investigation of the material. Textual and lexical data of all Georgian dialects are integrated into the corpus.

During morphological annotation the marker of the dialect origin is neglected. Glossary is provided in form of the usual alphabetic concordance in which every acknowledged word form differing from the literary word form is assumed as a dialect form.

The primary morphological annotation or the part-of-speech tagging enables us to divide the word forms represented in the dictionary into several parts – grammar groups.

The Georgian dialect corpus is created at the Arn. Chikobava Institute of Linguistics. Its purpose is to portray the texts’ collection of all Georgian dialects by the corpus technology. At present, there are narrative and lexicographic data of 16 Georgian dialects placed in the corpus. However, potentially the corpus can present the data of other Kartvelian languages as well.

Morphological annotation of the corpus is one of the most important stages of the work, being now carried out by the working group. Here in this article, the five-level process of morphological annotation developing is discussed and the mechanism of forming and expanding the bases containing the relevant linguistic information is presented.

The Georgian Dialect Corpus (GDC _ http://mygeorgia.ge/gdc ) is the first corpus work in Georgian language. Its test version has become available for the internet users in the last two years and it has appropriate professional responses. Several researchers interested in the Georgian dialects and generally in the Georgian language, have been already working on the bases of the GDC.

We have considered preparation and filling up of the texts’ collection and development of the logical architecture of the corpus, as the priority among the several components. It is implemented the unprecedented quantity of work. As a result, the corpus is already providing the rapid search and the effective investigation of the material. Textual and lexical data of all Georgian dialects are integrated into the corpus.

During morphological annotation the marker of the dialect origin is neglected. Glossary is provided in form of the usual alphabetic concordance in which every acknowledged word form differing from the literary word form is assumed as a dialect form.

The primary morphological annotation or the part-of-speech tagging enables us to divide the word forms represented in the dictionary into several parts – grammar groups. 

https://archive.illc.uva.nl/Tbilisi/Tbilisi2011/Programme/index.html
6th International Contrastive Linguistics Conference (ICLC6)Berlin, Germani201030.09-02.10Freie Universitat BerlinInteractive system for compilation of multilingual concordancersoral

Nowadays a bilingual corpus of parallel texts is an important instrument for contrastive analyses and various linguistic studies. The concordance is in the centre of corpus linguistics, because it provides access to many important language patterns in texts. An approach to automate the creation of multilingual (Georgian –English – Russian) concordancers is considered. We use a new system called “GeoTrans” for dictionary management and for aligning bilingual parallel texts. The system applies the rule-based morphological system, so it will be possible to generate necessary rules for words not included in “GeoTrans” database, as well. The applied program interface of “GeoTrans” offers acquisition of morphological description depending to word paradigms.

Here technology of expert systems is used, particularly the frame-based representation of Marvin Minsky. In order to describe a given word form in frames, it is necessary to determine the phenomena, connected to the word, the reasons causing these phenomena and the script of the different phenomena. The system creates the library of frames, in which the cause-effect relations of all morphological phenomena are represented. In our “GeoTrans” system "phenomenon" means determination of those "facts", which are carried out on the basic dictionary units, on the “terminals”. The linguistic concepts of these "facts" mean those morphological ways, i.e. procedure (in the terms of expert system), with which help from the “terminal” we get word form.

The system is implemented as much as possible language independent method. The system has been tested on the poem "The Knight in the Tiger's Skin" by the Georgian poet Shota Rustaveli, with the parallel Georgian –English – Russian texts. For today works for creation and correction of the primary and secondary lexical databases supported by researches of linguists at Georgian Ilia Chavchavadze university are being conducted.

http://listserv.linguistlist.org/pipermail/hpsg-l/2010-March/002229.html
Batumi I International Symposium in LexicographyBatumi, Georgia201014-16 მაისიBatumi Shota Rustaveli State University, The Faculty of the HumanitiesUniversal morphological analyzer as the tool of lexicographical studiesoral

In the epoch of the Internet traditional printed dictionaries gradually disappear and their technological virtual doubles take their place. Nowadays, if we revise modern Internet dictionaries, we’ll find out that many of them are not limited with only traditional dictionary representations. Most of them “are armed” with morphological, syntactic, semantic analyzers. We present a universal morphological analyzer, which clearly shows on the examples of three languages (Georgian, Russian, English) how simply we can provide any computer dictionary with morphological analyzer and can use it for different lexicographical studies.

By tradition, the knowledge representation of language morphology is given by the description of its word paradigms, i.e. terminals. In the GeoTrans system, which is offered by us, the technology of expert systems is used; particularly the frame-based representation of Marvin Minsky. In order to describe a given word form in frames, it is necessary to determine the phenomena, connected to the word, the reason causing these phenomena and the script of the different phenomena. We address to the paradigm of the word and determine the cause-effect relations between the reasons and consequences. The system creates the library of frames, in which the cause-effect relations of all morphological phenomena are represented.

In our expert system "phenomenon" means determination of those "facts", which are carried out on the basic dictionary units, on the terminals. The linguistic concepts of these "facts" mean those morphological ways, i.e. procedure (in the terms of expert system), with which help from the terminal we get word form.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwihwJK7lIf3AhWhSPEDHVejDncQFnoECAMQAQ&url=https%3A%2F%2Fice.ge%2Fnew%2Fbatumi%2Fprograma_geo.pdf&usg=AOvVaw2iBgWZqP4O6XNEdU5Tsx8u
Georgian Language and Modern TechnologiesTbilisi, Georgia200920-21 ოქტომბერიTSU Arnold Chikobava Institute of LinguisticsGeorgian Computer Prompteroral

Georgian computer prompter is a software which can assist the disabled to write in Georgian on the computer. As is known this problem cannot yet be solved by any current software. This system suggests the correct forms of the word and makes it easy to use the keyboard with the minimum of effort. The research group at KTH (Dept. of Speech, Music and Hearing at the Royal Institute of Technology in Sweden) has been working on the assistance system for selection of the desired and correct forms for a long time. Programs that carry out this function are called word predictor “Prophet”. A word predictor suggests words whilst a person is writing, either based on the preceding word or the first letter(s) of the current word. We have decided to create a Georgian system similar to Prophet. For the Georgian version of Prophet which we call “Computer Prompter”, it is necessary to adjust the programme code of Prophet to the Georgian language; to create the Georgian text corpora (no less then a million words); to increase the database of the Georgian dictionary; to develop a morphological processor of the Georgian language and to create the dictionary of affixes and modify the dictionary database of Prophet. By this time the Georgian text corpora has been filled with about one million words. In order to create the text corpora we used Georgian internet sites. In addition, to ensure the variety of the themes, we collected and developed texts comprising twelve themes: Georgian history, Religion, Culture, Medicine, Sport, Economic, Politics, Show-business, Family, Children, People and Society. The text corpora require special processing such as the format of the entry file must have only one word in each line. As a result of the Project implementation the Georgian version of Prophet will be created. The software will assist the users to select grammatically correct words while keying in Georgian texts by means of specifying the sequence of words. The database of the dictionary will contain 100,000 basic words and all of the rules for their derivation. At present 30,000 units have been added to the dictionary of basic forms and rules.

https://www.ice.ge/new/pages/news/konferencia.pdf
Georgian Language and Modern TechnologiesTbilisi, Georgia200920-21 ოქტომბერიTSU Arnold Chikobava Institute of LinguisticsComputational Implementation of the Georgian Numeralsoral

We presents a method of computational implementation of the Georgian numeral system in the Multilingual Expert System of Language Modeling (MESLM).

In the MESLM system the morphological dictionary has been divided into a canonical dictionary and two lexicons of rules. The canonical dictionary (CaDic) is a database file containing correspondences between canonical forms of the word and markers of morphotactic rules. The rules of morphotactics have been described in the databases of two lexicons: inflection (InRuLex) and derivation (DeRuLex) rule lexicons. Noun declination, verb conjugation, comparison of adjectives, productive derivation and compounding are provided with support of the continuation lexicons: inflection (InFeLex) and derivation (DeFeLex) grammatical features lexicons.

The system is bidirectional and represents inflection of numerals with markers that indicate their ordinality and morphological features. It can be used to analyse complex numerals in any of the cases of the system and to generate their corresponding expression represented with markers and numbers.

In the Georgian language, the representation of numbers into numerals is much more complicated than similar transformations in languages such as English. This is because Georgian complex numerals are inflected and numbers are expressed by the base-20–decimal mixed system.

All numerals from eleven to nineteen are complex numerals. Arguably, we can use the arithmetical “base” to construct numeral expressions. In Georgian the numbers from 20 up to 99 are expressed by the base-20 system. Numbers above 100 are expressed base-20 –decimally but schematically the interpretation of numeral expressions is very similar in most languages – [M • F + R]. The Value (V) of the numeral is obtained by multiplying M (Multiplicand) by F (Factor) and adding R (Remainder) to the result. In this recursive structure the M and R components may themselves be complex numerals. However, ordinal and fractional numerals always are written as one word.

https://www.ice.ge/new/pages/news/konferencia.pdf
8th International Symposium on Language, Logic and computationTbilisi, Georgia20091-5 ოქტომბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.Modeling of derivation in the Multilingual Expert Systemoral


https://archive.illc.uva.nl/Tbilisi/Tbilisi2007/index.php%3Fpage=15.html
7th International Symposium on Language, Logic and computationTbilisi, Georgia20071-5 ოქტომბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.Three Aspects of Language Modellingoral

From some point of view, Language Modeling (LM) can be considered as a some axis of the linguistics. Just in the frames of it, the different basic components of language should be unified and as a result brought into accord and conformity. Just one more, dimension of such relations is here under consideration: that is, we shall here touch the question of a triple relation between aspects of language knowledge, its use and its acquisition. This direction of investigations newly began and as far only some sketches of the language knowledge/use relation are ready for demonstration, though even they are not sufficiently tested and don’t guarantee complete correctness of their functioning.

According to the wide accepted opinion that language knowledge can be represented by a generative grammar the productive components (synthesis and analysis) are based on the morphologic generator Bf→P, which transforms each Basic (dictionary) form into all members of the Paradigm corresponding to the input Bf. Just this component of the scheme is the most accomplished: its object is Russian morphology and it is based on A.A. Zaliznyak dictionary.  As to synthesis it obviously does not create any serious problems in the context of the Bf→P system if we suppose that its input is just Bf’ which is at the same time the generator’s input also; and the choice of the required form is immediately defined by the grammatical part of the input. Essential heavier version of this problem characterizes the analytic component of system. In this case a direct mode of comparison in course of dictionary search is changed by the attempts to find some alikeness between the input Wf and Bf’. Such more specific process exacts some more effective means for reduction of the search area in dictionary and in the generated paradigm both. 

https://archive.illc.uva.nl/Tbilisi/Tbilisi2007/index.php%3Fpage=15.html
Natural Language ProcessingTbilisi, Georgia200520-22 ოქტომბერიArnold Chikobava Institute of LinguisticsAutomatic system of unification of the morphological representation of the Russian languageoral

Today, many new consumer programs are being developed in the field of computer processing of Russian printed texts. The module of Russian morphology plays a key role in their implementation. Creating a traditionally automated morphological processor, whether it is an analyzer, a synthesizer, a tag, or a translator, requires computer linguists to solve three basic tasks: Second - to identify groups of words that have the same paradigm, and third - to compile a dictionary in which each source word will have its own characteristic of a paradigm or paradigms (in the case of homonyms). As it is known, in modern Russian grammar so far the most perfectly reflects the inflection of the word А. А. Zaliznyak dictionary. In it each word has its own grammatical signs and indexes, which uniquely indicate the points from the grammatical reference, where the rotation or conjugation diagrams characteristic for the classes of specific starting words are given. In addition, the dictionary uses an inverted sequence of words. As a result, words with the same paradigms are placed close to each other, making it easier to combine them into word arrays with the same paradigm.

The main achievement of the application program offered by us can be considered A. А. Зализняк 1977 Identify all the types of symbols in the dictionary articles and then, based on them, develop a computer system in which it is possible to automatically extract word arrays with the same inclination. Formalizing the markup and indexing process used in dictionary articles during system development was complicated not only by the multiplicity of different characteristics of the original words (24 face-objects), but also by the ambiguity of their meanings, which is determined by a number of characteristics.

The linguist-user has the possibility to pre-classify the dictionary according to any part of speech and main, if auxiliary morphological characteristic (or characteristics), and to group the lexical items of the desired class according to all possible combinations of characteristics indicated in their record.

http://www.ice.ge/conferenciebi/Bunebriv%20enata%20damushaveba.html
6th International Symposium on Language, Logic and computationBatumi, Georgia200512-16 სექტემბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.Classification of the Russian Morphologyoral

For the operation of an automatic paradigm generation system within the framework of a common Swedish-Georgian project (KTH, Stockholm - ISU AN Georgia, Tbilisi), a dictionary of word forms with morphological information about each word and the selection of lists of words with the same characteristics is required. The dictionary consists of approximately 100,000 words. Moreover, each word has grammatical markers and indices that indicate its own scheme of declension or conjugation of the word. In addition, the dictionary uses the so-called inverse word order, as a result of which, basically, words with the same information about the declension (or conjugation) are located next to each other, which contributes to grouping them into groups with identical characteristics. This type of method is widely used by linguists in the creation of automatic processors.

In preparing our dictionary for work, a number of problems were solved. The system formalizes the method of recognition of morphological characteristics based on the designations adopted in the entries of the dictionary of Russian lexical units. The dictionary highlights 24 main features. For each dictionary unit, a row is recorded in the "RusLex" database, the structure of which makes it possible to further process the dictionary.

An application program ZalDict has been created, which works under Windows'95 and higher. The system is designed for research purposes, in particular, many hours of searching and selecting words for one or more features using the appropriate modes takes only a few seconds.

The main advantage of the system is that with the help of our program it is possible to classify a dictionary according to any combination of given features. Thus, it is possible to search for such classes of words that have the same declension (conjugation) scheme.

Natural Language Processing The Georgian Language and Computer Technologies Tbilisi, Georgia200421-23 ივნისიArnold Chikobava Institute of Linguisticsoral

In the morphological processors used in the systems of analysis, synthesis and tagging, we use the so-called "morphological function recording and playback" method - MorZaVo. Based on this method, a "Georgian-Russian-English word-interpreter" was created. The main advantage of the method is its universality. The MorZaVo system is algorithmized not for processing individual morphological categories and their corresponding morphological signs, but for recognizing-remembering-remembering these categories, the morphological methods and signs used to express them. . The MorZaVo model makes it easy to fill in dictionaries of different languages ​​and we have solved the most difficult task of automatic translation. Only the morphological level has been implemented so far. Texts are processed and relevant information is found for each of their elements using system analysis, synthesis and tagging blocks, which further allows us to solve the problems of automatic translation syntax and semantics as well. Based on the MorZaVo model, a "Georgian-Russian-English word-to-word interpreter" system has been developed, which is integrated into the GeoTrans user-application program. The main purpose of the system is to find the source forms of all correspondences in any language of the translated language. During the adaptation of the MorZaVo system, difficulties were created in determining the compatibility of elements of translation and translation languages. It happened in such a way that the uniformity of the conformity of the lexical items was violated, that is, in particular, one source form of one language did not fit one source form of another language. The compound of one language in another language does not mean the combination of word forms with corresponding meanings in another language and is expressed only by idiomatic combination. Thus, for example, in English, the overlays of the composite beginning with the heading over over, overnight, overlap, overcast overawe, etc.) Georgian and Russian correspondences are mostly expressed in a few words. The picture is similar in the case of many Georgian compound words (eg: subordinate, kargia, door-to-door, closed-door, aunt, stepfather, etc.). It became necessary to algorithmize the notation and "write" of composites, constructions, idioms in the correlation tables. The REVFORM (Formatting) Operator selects all possible options when "writing" the matching tables provided to the system and saves them in the "Multilingual Dictionary Matching" database, and then when the system "Interpreter" block is running, the CREFORM (Forming Compiler) operator creates the appropriate component. If an idiom.

http://www.ice.ge/conferenciebi/Bunebriv%20enata%20damushaveba.html
Проблемы Управления и Энергетики, PCPE-2004Tbilisi, Georgia200427.09. – 1.10Georgian-russian-english literal interpretationoral

Bfp is morphological generative system (Basic form→Paradigm), which generates for each input basic form (Bf) its whole paradigm (P). As yet it is implemented and tested on the Russian verb paradigms only. However this system is already connected with the Printing Support System (PSS) and is supposed to represent its morphological component. According to PSS restrictions it will demonstrate on the screen in consecutive order subsets of verb paradigms including less than 10 members. The hypothesis is postulated that NL mechanism may function according to the somewhat alike scheme

https://gtu.ge/msi/Files/Pdf/Publications/sarchevi_2004.pdf
5th Tbilisi Symposium on Language, Logic and ComputationTbilisi, Georgia20036-10 ოქტომბერიThe Centre for Language, Logic and Speech at the Tbilisi State University, the Georgian Academy of Sciences and Institute for Logic, Language and Computation (ILLC) of the University of Amsterdam.Record and reproduction of morphological functionsoral

It's obvious that today's linguistic researches require computer record and processing of the object of investigation. The main purpose of our system is a processing of text with the final, though not the single, aim of construction of automatic multilingual dictionaries can be used for:

§ Alignment of text;

§ Record indexing;

§ Division of text into sentences;

§ Calculation of word frequencies;

§ Simulations processing of practically unlimited amount of parallel texts;

§ Addition of new languages;

§ Creative of new and enhancing  of the already present in the system dictionaries;

§ Word analysis;

§ Finding and saving of synonymous equivalents;

§ Comparison of different uses of words.


The core of this approach is the so-called method of "record and reproduction" of morphologic characteristics and rules of word inflections' generation. It means that during the analysis is carried out "reproduction" and adjustment of the already recorded rules to the input word form, comparison of so received basic form with the dictionary list and, if the process does fail, demonstration of the result on the screen. Each time, when the system cannot find in the dictionary same basic form corresponding to the current input word, it proposes to the user to print characteristics of this lexical unit or to define its whole paradigm. After that the system creates automatically, by the means of the "record" – operator, corresponding rule, compares it with already present ones, and, if some identical one is not found, includes in the list of rules a new one.

System is implemented by the means of net representation of morphologic processors. System can supply records of conditions of both right-hand and left-hand labels of morphologic net and thus further automation of this quite work-consuming part of implementation of morphologic processors.


https://archive.illc.uva.nl/Tbilisi/Tbilisi2003/

Web of Science: 1
Scopus: 1,5
Google Scholar: 2

-5 მარტი - 15 მაისი. 2006 წელი Institution for Speech, Music and Hearing, Kungliga Tekniska Högskolan (KTH)Private company Honeysoft
-8 აგვისტო- 7 სექტემბერი. 2008 წელი Institution for Speech, Music and Hearing, Kungliga Tekniska Högskolan (KTH)Private company Honeysoft

Doctoral Thesis Referee


Master Theses Supervisor


Doctoral Thesis Supervisor/Co-supervisor


Scientific editor of monographs in foreign languages


Scientific editor of a monograph in Georgian


Editor-in-Chief of a peer-reviewed or professional journal / proceedings


Review of a scientific professional journal / proceedings


Member of the editorial board of a peer-reviewed scientific or professional journal / proceedings


Participation in a project / grant funded by an international organization


Algorithmic Description of Russian word morphologyThe Royal Swedish Academy of Sciences შვედეთი 2005-2006Key Personnel
Volkswagen Stiftung, AZ 86154 გერმანიზ 2012-2015Key Personnel

Participation in a project / grant funded from the state budget


English-Georgian computer dictionary with attached morphological processorsGrant of the Georgian Academy of Sciences 2002-2003Key Personnel
Network representation and computer realization of the morphological level of the generative grammarGrant of the Georgian Academy of Sciences 2004-2005Key Personnel
Creating automatic syntactic analysis of Georgian textArchil Eliashvili Institute of Control systems of the Georgian National Academy of Sciences 2004-2006Key Personnel
English-Georgian automatic translation systemLEPL Archil Eliashvili Institute of Control systems 2007-2009Key Personnel
Automatic explanatory-combinatorial dictionary as the basis of modeling of the Georgian languageShota Rustaveli National Science Foundation 2009-2011Key Personnel
Georgian Computer prompter for disabled personsShota Rustaveli National Science Foundation of Georgia 2009-2011Principal Investigator

Patent authorship


Membership of the Georgian National Academy of Science or Georgian Academy of Agricultural Sciences


Membership of an international professional organization


Membership of the Conference Organizing / Program Committee


National Award / Sectoral Award, Order, Medal, etc.


Honorary title


Monograph


Handbook


Research articles in high impact factor and local Scientific Journals


Publication in Scientific Conference Proceedings Indexed in Web of Science and Scopus


Dialect Dictionaries in the Georgian Dialect Corpus, Theoretical Computer Science and General Issues. Publisher: Springer-Verlag Berlin Heidelberg, 2015. pp. 82 - 96Grant Project

The Georgian Dialect Corpus (http://mygeorgia.ge/gdc) is being developed as an instrument for the study and documentation of the geo-graphical varieties of Georgian. The strategy for the development of the GDC was based on one hand, on the international corpus experience, and on the traditions of Georgian dialectology and dialectography, on the other hand. In the corpus designing process we did our best to take into account the Georgian national linguistic and cultural space peculiarities.      

In the Georgian Dialect Corpus, dictionaries are applied to accomplish two goals: to achieve representativeness and for morphological annotation. The present paper gives the detailed description how the above mentioned functions are realized.

New texts are continuously being added to the corpus, and at the same time, the morphological annotation of the data is under processing; therefore, so far, the corpus can only be queried according to the following meta-textual (non-linguistic) features:

• Language and dialect

• Place of recording

• The informant’s identity

• Thematic and chronological features of a text

• Text type (narrative, poetry, conversation…)

  The structure of the corpus has been entirely determined by the fact that its technological chain comprises the whole cycle of text processing, beginning from data recording till their integration in the text base of the corpus. Hence, when the planning of field activities outline the occurrence of such components of the corpus as a block of administrative units, information blocks of chronological, thematic, sociologic, etc. features.

In order to facilitate the morphological annotation of the corpus, we presented the dialect dictionaries as “partially grammatical” dictionaries and applied them in the lemmatization and linguistic annotation processes. We decided to use the data of Georgian dialect lexicography in order to increase the lexical database (textual base) of the corpus as well.

https://archive.illc.uva.nl/Tbilisi/Tbilisi2013/
Syntax Annotation of the Georgian Literary Corpus, Theoretical Computer Science and General Issues. Publisher: Springer-Verlag Berlin Heidelberg, 2017 / LNCS 101148, pp 89-97State Target Program

In order to solve theoretical and applied tasks of Georgian language it is very important to draw out deeply annotated text corpora. While syntactically annotated corpora are now available for English, Czech, Russian and other languages, for Georgian they are rare. The environment, developed by our research group, offers several NLP applications, including a module of morphologic, syntactic and semantic level, a Universal Networking Language interface and a natural language interface to access SQL type databases. In this article, we research the automatic syntactic parser of Georgian Language. It includes syntactic level as well as morphologic level of Georgian language model. The basis of the linguistic model of Georgian text syntax annotation is the dependency grammar.

https://archive.illc.uva.nl/Tbilisi/Tbilisi2015/Accepted-abstracts/index.html