The weekend only refreshes so much

tetris11@feddit.uk · 1 day ago

number 2 works less well if you are off white

tetris11@feddit.uk · 4 days ago

LIMMEH

tetris11@feddit.uk · edit-2 4 days ago

Interestingly

Aprch
Maril

are the only two hallucinations, everything else is always a legit month

tetris11@feddit.uk · edit-2 4 days ago

hierarchical letter clustering would be my guess, or graph-based clustering using ngrams of 2-4 as nodes and maximising for connections.

Or using an optimized Regex and printing out the DFA?

Edit: Quick N-gram analysis (min=3, max=num letters in that month)

R-code

library(ngram)

tmonths = c("january", "february", "march",
           "april", "may", "june", "july",
           "august", "september", "october",
           "november", "december")

zzz = lapply(tmonths, function(mon){
  ng = ngram::ngram_asweka(paste(unlist(strsplit(mon, split="")), collapse=" "), min=3, max=nchar(mon))
  return(gsub(" ", "", ng))
})
res = sort(table(unlist(zzz)))
res[res > 1]

This gives the following 9 ngram frequencies greater than 1:

  ary   uar  uary   emb  embe ember   mbe  mber   ber 
    2     2     2     3     3     3     3     3     4

As you can see two longest most common motifs are “em-ber” and “uar-y”

Using this I propose the following graph

Mermaid

stateDiagram
    direction LR
    sept --> em
    nov --> em
    dec --> em
    em --> ber
    oc --> to
    to --> ber
    feb --> uar
    uar --> y
    jan --> uar
    ju --> ne
    ju --> l
    l --> y
    ma --> r
    ma --> y
    r --> ch
    
    a --> p 
    p --> r
    r --> il
    a --> u
    u --> gust

tetris11@feddit.uk · edit-2 4 days ago

Genuine Question:

if you could split the month names into 3, how would you split them to maximise their choice overlap?

“em” is a good overlap for nov/sept/dec
“uar” is good for jan/febr

tetris11@feddit.uk · 15 days ago

The weekend only refreshes so much