Commit d5248def authored by Dos Santos David's avatar Dos Santos David

add graphs

parent d8b1ebce
...@@ -25,7 +25,6 @@ b: 0.509 ...@@ -25,7 +25,6 @@ b: 0.509
k: 31.7 k: 31.7
estimation of vocabulary size for 1M tokens : 36034 estimation of vocabulary size for 1M tokens : 36034
``` ```
#### Collection CS276 #### Collection CS276
...@@ -35,20 +34,25 @@ Voici l'analyse obtenue pour la collection CS276 ...@@ -35,20 +34,25 @@ Voici l'analyse obtenue pour la collection CS276
``` ```
****************** Count tokens ****************** ****************** Count tokens ******************
Total count of tokens : 17,879,253 Total count of tokens : 25,498,340
Vocabulary size: 337,191 Vocabulary size: 347,071
****** Count tokens for half the collection ****** ****** Count tokens for half the collection ******
Total count of tokens : 9,958,569 Total count of tokens : 14,332,579
Vocabulary size: 191,499 Vocabulary size: 196,989
******** Heap's law parameters estimation ******** ******** Heap's law parameters estimation ********
b: 0.967 b: 0.983
k: 0.0328 k: 0.0181
estimation of vocabulary size for 1M tokens : 20755 estimation of vocabulary size for 1M tokens : 14374
``` ```
Graphes pour la loi de Zipf :
![zipf_law](/graphs/cs276_zipf_law.png)
![zipf_law_logs](/graphs/cs276_zipf_law_logs.png)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment