Our own Text Analyser might be able to handle that; I've never tried with so big a text though. And the stats it generates aren't configurable (at least, they weren't when I last looked). It sounds like the sort of task that used to be done using the COBOL-like language SNOBOL (though I imagine there would be a Java class that would help).
b