# # pl language letter distributions for Lexica # # Automatically generated by a genetic algorithm run with 1000 iterations # - Each iteration, 100 4 x 4 Lexica boards were generated. # - The fitness function looks at how many words can be played on each of these boards. # - Fitness is defined as "(Mean * Mean) - (Standard Deviation * Standard Deviation). # # The goal is boards that on average have a high number of words available (boards with zero or only a small number of words are not good), # but the standard deviation is low (some languages tend to result in boards with hundreds of words, which is also not particularly great). # # Fitness for this probability: # Min: 40, mean: 178, max: 355, stddev: 69, score: 21995 # ł 10 1 ń 3 1 ą 11 1 1 ć 4 1 ę 8 1 ś 3 1 a 89 75 51 3 b 4 1 1 c 5 1 1 1 d 8 1 1 e 72 24 2 1 f 6 1 1 g 14 1 1 h 1 1 1 i 74 25 3 1 j 2 1 1 k 67 5 1 l 61 1 1 1 m 85 46 11 1 n 84 42 14 o 95 63 30 1 p 82 9 1 r 76 42 7 ó 7 1 s 64 46 24 1 t 46 17 1 u 75 19 1 1 w 22 4 1 1 y 6 1 1 1 z 3 1 1 ź 1 1 ż 3 1 1 # # Note: As per https://en.wikipedia.org/wiki/Scrabble_letter_distributions#Polish: # # > The letters Q, V and X have always been absent (since they are used in foreign words), and blank tiles cannot be used to represent these. # # Therefore we will not include them in Lexica. There are still some words in the dictionary which include them, but only ~500 # out of a total of ~430,000 words, therefore we're not concerned about removing them from the dictionary to save space. # # q 3 # v 2 # x 2 #