Commit d5248def authored by Dos Santos David's avatar Dos Santos David

add graphs

parent d8b1ebce
......@@ -25,7 +25,6 @@ b: 0.509
k: 31.7
estimation of vocabulary size for 1M tokens : 36034
```
#### Collection CS276
......@@ -35,20 +34,25 @@ Voici l'analyse obtenue pour la collection CS276
```
****************** Count tokens ******************
Total count of tokens : 17,879,253
Vocabulary size: 337,191
Total count of tokens : 25,498,340
Vocabulary size: 347,071
****** Count tokens for half the collection ******
Total count of tokens : 9,958,569
Vocabulary size: 191,499
Total count of tokens : 14,332,579
Vocabulary size: 196,989
******** Heap's law parameters estimation ********
b: 0.967
k: 0.0328
b: 0.983
k: 0.0181
estimation of vocabulary size for 1M tokens : 20755
estimation of vocabulary size for 1M tokens : 14374
```
Graphes pour la loi de Zipf :
![zipf_law](/graphs/cs276_zipf_law.png)
![zipf_law_logs](/graphs/cs276_zipf_law_logs.png)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment