Corpus of Spanish offering 100 million + words
I’ve just opened an email from elcastellano.org reporting of an interview with Dr Mark Davies from Brigham Young University in which he talks about the benefits that his work on a corpus of the Spanish language, can provide to people interested in the aspects of written and spoken Spanish.
The Oxford Concise Dictionary of Linguistics by P. H. Matthews defines the term corpus in the first sentence to this entry as
Any systematic collection of speech or writing in a language or variety of a language.
Spanish possesses a vast oral and written corpus which can, with the help of the new IT technologies and the arduous work of academics like Dr Davies, be now available to the general public and Spanish language researchers.
This excellent corpus del español (http://www.corpusdelespanol.org/) is an invaluable tool to research aspects connected to the evolution of the Spanish language as the documents entered in its database comprise a very large amount of historical material going as far back as the 1200′s.
Any word, phrase, or combination of words in any given form can be searched for at the corpus del español website. Apart from the historical aspects connected with the language structures a person may be searching for, they can also search for terms as used by academia, the news, fictional writing and oral language.
Like for the corpus of any language, Dr Davies’ work is complex and therefore difficult to explain its mechanisms in a brief post like this. The best approach – in my opinion – is to spend some time at his website and follow the instructions given there.
I’ve only been able to have a quick look at this website. I’m pretty sure that I’ll be using it at a regular basis. I’m adding it right now to my links here.
The Corpus of Spanish by Dr Davies is a primary resource for any person wanting to know in detail aspects relating to the historical, syntactic, and semantic nature of the Spanish language.