Information technologies in linguistics have been formed especially for such vectors of activity as obtaining, storing, transmission, distribution and transformation of language data with the help of modern computer programs. Their introduction began in the last century, but today, their development is taking great steps. Perhaps soon we will be writing books completely on computers, and software translators will become a regular part of every person’s daily life.
Linguistics and information technology: features of their interaction
The first device to recognize speech on its own came in 1952. Today, this direction has become very developed and has passed a great path of evolution. The best example is usage of voice navigation in some search engines. The next step is to recognize spoken language using a large vocabulary. In this way, the device translates into text all the words that a person utters aloud. The STT (or speech to text) technology is still in the process of being developed, but it recognizes speech with a fairly high degree of accuracy, which is enough to put it into practice.
Support for entering text onto electronic media has also become part of this technology recently. One of the first applications developed for this purpose was those that carry words and perform spell checking text automatically (such as spellers). Today, the recognition of printed or handwritten text and autocompletion have become a reality.
The first layered machine translation programs were created more than fifty years ago. But, people wanted more than a simple translation of words. Although this sphere is developing now, the idea of quality translation without human participation is still a fantasy of the modern man, who is likely to be successful in the future.
The interaction of linguistics and IT is clearly expressed in such fields as:
- information retrieval and indexing of documents;
- compression of text messages (abstraction and annotation);
- extraction of facts and knowledge (Information Extraction) based on syntax analysis;
- recognition of predefined scripts;
- language simplification for specialised databases;
- classification and clustering of texts.
The latter direction is now particularly popular and it is rapidly developing towards the recognition of spam and fake information. The clustering principle is applied when classifying SMS messages in mobile devices.
Main tasks of linguistic information technology
In this process, the most important thing is to create language processors. It is a term that developers call products such as systems that automatically analyse and synthesise text messages or live speech. There are three types of similar analysers:
- It defines the grammatical characteristics of a word (such as a part of speech and an appropriate set of grams, such as case, face, number and time).
- It takes into account semantics, syntax and vocabulary and word combinations. As a result, the creation of a dependency tree is formed.
- It uses syntax rephrasing rules and makes a transition to a profile semantic syntax sentence structure. As a result, a semantic network begins to exist, which is automatically compared to a database of infosystems.
Among some important tasks of this product, there is a perfect machine translation using artificial intelligence too. Also, this field of activity without human participation includes automatically compiled dictionaries and encyclopedias, which we use in search engines for machine translation.
Besides, computational linguistics develops systems that automatically analyse and synthesise speech, that is, it develops a natural-language interface. In this case, to recognize speech, we should necessarily involve all levels of language.