English language statistics
Hello, me and my colleagues are students of FEI STU in Bratislava, Slovakia. During school year we were working on automated (by computer with limited user input) cracking so-called Zodiac killer cipher ttp://sjgt.yweb.sk/tim/en/news. htm . You can see (link) that we were able to crack Z408 cipher and currently are working on Z340. For this work we need some thorough statistics of English language, e.g. most common beginnings of sentences, words that follow, anything would help. I think they were made from English corpus, but we can't get them because they are too expensive or the people (academics) we asked for help in most cases simply ignored us... So if anyone had access to these stats and was willing to help us in any way, we will be very grateful. Thank you in advance.
Re: English language statistics
Maybe this link will be helpful: CORPORA: 45-400 million words each: free online access
But you probably know it already.
Re: English language statistics
Well I didn't find this one:) Looks useful, but we need more specific stats. Like most common beginnings of sentence etc.
Re: English language statistics
I understand, but you can try making your own stats there. There are some tools. I know nothing about it though, so I can't help you. Good luck!
Re: English language statistics
You'll find some common word lists here: Log In - UsingEnglish.com
If you have a look at these links, you might find some stuff:
Corpus Linguistics - ESL Web Directory - UsingEnglish.com
There's an annotated page of links there that does have other lists and corpora, but I am afraid that most of the stuff you're looking for, with combinations, etc, is not available for free, and is often very expensive.