This gives the following 9 ngram frequencies greater than 1:
ary uar uary emb embe ember mbe mber ber
2 2 2 3 3 3 3 3 4
As you can see two longest most common motifs are “em-ber” and “uar-y”
Using this I propose the following graph
Mermaid
stateDiagram
direction LR
sept --> em
nov --> emdec--> em
em --> ber
oc --> toto--> ber
feb --> uar
uar --> y
jan --> uar
ju --> ne
ju --> l
l --> y
ma --> r
ma --> y
r --> ch
a --> p
p --> r
r --> il
a --> u
u --> gust
Genuine Question:
if you could split the month names into 3, how would you split them to maximise their choice overlap?
I assume the post is the maximum. I wonder if there is an algorithm for that
hierarchical letter clustering would be my guess, or graph-based clustering using ngrams of 2-4 as nodes and maximising for connections.
Or using an optimized Regex and printing out the DFA?
Edit: Quick N-gram analysis (min=3, max=num letters in that month)
R-code
This gives the following 9 ngram frequencies greater than 1:
As you can see two longest most common motifs are “em-ber” and “uar-y”
Using this I propose the following graph
Mermaid
stateDiagram direction LR sept --> em nov --> em dec --> em em --> ber oc --> to to --> ber feb --> uar uar --> y jan --> uar ju --> ne ju --> l l --> y ma --> r ma --> y r --> ch a --> p p --> r r --> il a --> u u --> gust
Interestingly
are the only two hallucinations, everything else is always a legit month