Title: | Makes Nonsense Words Based on English Letter Frequency Data |
---|---|
Description: | This constructs "words" based on weighted sampling from letter and ngram frequency data in English, as summarised by Peter Norvig. |
Authors: | Fran Barton [aut, cre] |
Maintainer: | Fran Barton <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-09-05 15:17:02 UTC |
Source: | https://github.com/francisbarton/lettern |
The count and proportion of all 676 bigrams (pairs of consecutive letters) within Norvig's corpus.
bigram_frequencies
bigram_frequencies
A data frame with 676 rows and 4 variables:
bigram_1
character. The first letter of the bigram.
bigram_2
character. The second letter of the bigram.
count
double. Overall count of this bigram within corpus.
percentage
double. Overall percentage of this bigram within corpus.
https://www.norvig.com/mayzner.html
Build a full sentence of n nonsense "words"
build_a_sentence(n, end = ".", ...)
build_a_sentence(n, end = ".", ...)
n |
The number of words to include in the sentence. |
end |
The string to end the sentence with. Defaults to a full stop. |
... |
A place to pass on parameters such as |
A single string of n words with an ending character such as a full stop.
build_a_sentence(6, cutoff = 0.005)
build_a_sentence(6, cutoff = 0.005)
Choose a letter based on frequency (or frequency at position n within a word)
choose_a_letter(n)
choose_a_letter(n)
n |
If n is a non-zero integer between -7 and 7 (position in word, from 1 (first) to 7 (seventh) or from -1 (last) to -7 (seventh from last)), it returns a single letter based on letter frequencies at that position only. |
A lower-case letter (character string of nchar
1) between a and z.
choose_a_letter(3)
choose_a_letter(3)
From Norvig: "Now we show the letter frequencies by position within word. That is, the frequencies for just the first letter in each word, just the second letter, and so on. We also show frequencies for positions relative to the end of the word: "-1" means the last letter, "-2" means the second to last, and so on."
letter_frequencies_by_position
letter_frequencies_by_position
A data frame with 364 rows and 4 variables:
position
integer. Letter's position within a word.
letter
character. The 26 letters of the English alphabet (lower case).
count
double. Overall count of this letter within corpus, at this position within a word.
percentage
double. Overall percentage of this letter within corpus, at this position within a word.
https://www.norvig.com/mayzner.html
DATASET_DESCRIPTION
letter_frequencies_overall
letter_frequencies_overall
A data frame with 26 rows and 3 variables:
letter
character. The 26 letters of the English alphabet (lower case).
count
double. Overall count of this letter within corpus.
percentage
double. Overall percentage of this letter within corpus.
https://www.norvig.com/mayzner.html
The replace
parameter is fixed to TRUE, as this is what makes sense given
the frequency-dependent nature of this particular sampling approach.
This function returns a single letter based on a weighted sampling from all
26 letters, based on their overall frequency in Norvig's corpus.
sample_letters(n)
sample_letters(n)
n |
The number of letters to return. |
A vector of lower-case letters.
sample_letters(3)
sample_letters(3)
From Norvig: "Here is the breakdown of mentions (in millions) by word length (looking like a Poisson distribution).
word_length_frequencies
word_length_frequencies
A data frame with 23 rows and 3 variables:
word_length
double. Word length.
count_millions
double. Count of words of this length within corpus (in millions).
percentage
double. Percentage of words of this length within corpus.
https://www.norvig.com/mayzner.html
Writes a poem of pure gibberish
write_a_poem(lines, mean_line_length = 7, cat = TRUE, ...)
write_a_poem(lines, mean_line_length = 7, cat = TRUE, ...)
lines |
The number of lines per stanza. A single integer returns a single stanza of this many lines. A vector of multiple integers, of length n, will return a poem of n stanzas, with lengths as given in the vector. |
mean_line_length |
Line lengths will be generated at random from a normal distribution around this mean, with SD equal to 1 by default. |
cat |
Boolean. Whether to spew the poem straight to stdout via |
... |
A place to pass on parameters such as |
A beautiful poem (character strings concatenated with line breaks)
write_a_poem(c(4, 4), cutoff = 0.01)
write_a_poem(c(4, 4), cutoff = 0.01)