Introduction Part I Tables Part II Tables Part III Tables Part IV Tables About the Authors

ABSTRACT

        This four-part study includes character frequencies for letters, numerals and special characters using large samples of European languages in the Multilingual Corpus 1 compact disk published by the European Corpus Initiative of the Association of Computational Linguistics. Part One has frequency tables for Spanish. Part Two, has tables for French, Italian, Portuguese, Latin, and Greek. Part Three will similarly treat English, German, Dutch, Norwegian, and Swedish. Part Four will include frequencies for the remaining European languages in monolingual Corpus folders: Albanian, Bulgarian, Czech, Estonian, Gaelic, Lithuanian, Maltese, Russian, Serbian, and Turkish. Sample sizes, except for Bulgarian, Estonian, and Latin are a minimum of one million consecutive characters. The main tables for each language include combined and separate counts for upper and lower case letters with and without marks or accents, as well as tables for numerals and "special" characters.

 
Website Designed & Created by Michael Fleischmann of Cosmos Computing, LLC.