Piping a file whose insignificant whitespace has been replaced with binary 0 into this script, a wrapper to freq.py... produces this (the exact letters and values depend on the input, and note that the raw count has been limited to the ten most frequent letters and -combos)
Pre-defined letter-combinations Letter frequencies: a 14.3% r 8.6% e 5.8% s 5.4% ' 5.4% t 4.2% i 4.2% v 4.0% u 3.9% n 3.9% c 3.1% y 2.8% k 2.8% l 2.6% d 2.5% aì 2.4% x 2.0% g 2.0% n: 1.7% o 1.5% m 1.1% a: 1.1% r: 1.1% eì 1.0% j 0.9% i: 0.9% f 0.9% l: 0.8% þ 0.8% ð 0.6% p 0.6% uì 0.5% u: 0.5% b 0.5% , 0.5% v: 0.4% q 0.4% aò 0.4% z 0.3% o: 0.3% c: 0.3% H 0.3% yé 0.2% yá 0.2% s: 0.2% f: 0.2% yú 0.1% yè 0.1% x: 0.1% uí 0.1% uà 0.1% oé 0.1% m: 0.1% k: 0.1% aî 0.1% aá 0.1% q: 0.1% oò 0.1% oì 0.1% oá 0.1% j: 0.1% iá 0.1% ià 0.1% g: 0.1% eò 0.1% eà 0.1% aé 0.1% Pre-defined letter-combinations occuring initially... Letter frequencies: s 11.0% t 10.2% v 6.4% k 6.0% a 5.8% r 5.4% x 5.2% d 4.4% c 4.4% ' 4.2% g 3.6% u 2.8% m 2.6% aì 2.6% i 2.2% f 2.2% e 2.2% ð 1.4% o 1.4% þ 1.2% y 1.2% p 1.2% l 1.2% b 1.2% j 1.0% , 1.0% n 0.8% eì 0.8% z 0.6% yé 0.6% q 0.6% H 0.6% yá 0.4% uí 0.4% oé 0.4% uì 0.2% u: 0.2% r: 0.2% oò 0.2% oì 0.2% oá 0.2% n: 0.2% k: 0.2% iá 0.2% i: 0.2% aò 0.2% aá 0.2% a: 0.2% Pre-defined letter-combinations occuring finally... Letter frequencies: a 12.2% r 11.2% n 11.2% e 5.4% n: 4.8% l 4.6% ' 4.6% i 4.4% u 4.2% s 3.4% y 2.8% aì 2.8% o 2.2% c 2.2% v 2.0% d' 2.0% x 1.6% l: 1.6% eì 1.6% uì 1.4% u: 1.4% i: 1.2% þ 1.0% ge 1.0% a: 1.0% o: 0.8% m 0.8% aò 0.8% f: 0.6% b' 0.6% ð 0.4% v: 0.4% r: 0.4% q 0.4% p 0.4% H 0.4% z 0.2% yú 0.2% yè 0.2% t 0.2% s: 0.2% oì 0.2% f 0.2% c: 0.2% aî 0.2% Counted letters Letter frequencies: a 16.0% r 8.3% : 6.9% e 6.0% s 4.9% n 4.9% ' 4.7% u 4.5% i 4.4% v 3.8% Counted letter-combinations of two letters Letter frequencies: ar 3.9% aì 2.8% an 2.4% n: 2.0% sï 1.7% :a 1.6% ra 1.6% ta 1.5% 'a 1.5% en 1.4% Counted letter-combinations of three letters Letter frequencies: an: 1.2% aìr 1.1% ri: 0.9% ren 0.9% en: 0.9% ar: 0.9% a:r 0.7% vyc 0.6% tca 0.6% el: 0.6%
The actual counting is done by the freq-class.