LIMMEH
- 1 Post
- 5 Comments
Joined 5 months ago
Cake day: February 17th, 2025
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
Interestingly
- Aprch
- Maril
are the only two hallucinations, everything else is always a legit month
hierarchical letter clustering would be my guess, or graph-based clustering using ngrams of 2-4 as nodes and maximising for connections.
Or using an optimized Regex and printing out the DFA?
Edit: Quick N-gram analysis (min=3, max=num letters in that month)
R-code
library(ngram) tmonths = c("january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december") zzz = lapply(tmonths, function(mon){ ng = ngram::ngram_asweka(paste(unlist(strsplit(mon, split="")), collapse=" "), min=3, max=nchar(mon)) return(gsub(" ", "", ng)) }) res = sort(table(unlist(zzz))) res[res > 1]
This gives the following 9 ngram frequencies greater than 1:
ary uar uary emb embe ember mbe mber ber 2 2 2 3 3 3 3 3 4
As you can see two longest most common motifs are “em-ber” and “uar-y”
Using this I propose the following graph
Mermaid
stateDiagram direction LR sept --> em nov --> em dec --> em em --> ber oc --> to to --> ber feb --> uar uar --> y jan --> uar ju --> ne ju --> l l --> y ma --> r ma --> y r --> ch a --> p p --> r r --> il a --> u u --> gust
Genuine Question:
if you could split the month names into 3, how would you split them to maximise their choice overlap?
- “em” is a good overlap for nov/sept/dec
- “uar” is good for jan/febr
number 2 works less well if you are off white