Commit 9703f6d4 authored by Dos Santos David's avatar Dos Santos David

update numbers for cacm

parent cdb05e19
......@@ -9,22 +9,22 @@ Voici l'analyse obtenue pour la collection CACM:
```
****************** Count tokens ******************
Total count of tokens : 108,447
Vocabulary size: 11,627
Total count of tokens : 110,398
Vocabulary size: 9,497
****** Count tokens for half the collection ******
Total count of tokens : 30,052
Vocabulary size: 6,049
Total count of tokens : 30,672
Vocabulary size: 5,299
******** Heap's law parameters estimation ********
b: 0.509
k: 31.7
b: 0.456
k: 47.9
estimation of vocabulary size for 1M tokens : 36034
estimation of vocabulary size for 1M tokens : 25917
```
Graphes pour la loi de Zipf :
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment