Code for “Most frequent subsequences in a word”

Code used in the paper “Ranks of matrices with few distinct entries”

The paper constructs $n$-letter words over $k$-letter alphabet for which the largest number of occurrences of any subsequences is $\approx k^{n/4k}$. The construction is based on $8$ sign vectors $$ \begin{align*} u^{(1)}&=(+,+,+,+,+,+,+,+),\\ u^{(2)}&=(-,-,-,+,-,+,-,-),\\ u^{(3)}&=(+,+,+,-,-,-,-,+),\\ u^{(4)}&=(-,+,-,-,+,+,+,-),\\ u^{(5)}&=(+,-,-,+,-,-,+,+),\\ u^{(6)}&=(-,+,+,+,+,-,-,-),\\ u^{(7)}&=(+,-,-,-,+,+,-,+),\\ u^{(8)}&=(-,-,+,-,-,-,+,-). \end{align*} $$ It is then asserted that these vectors satisfy the following eight properties.

$u^{(i)},u^{(i+1)}$ agree on at most two coordinates.
$u^{(i)},u^{(j)}$ agree on at most four coordinates if $u^{(i)}\neq u^{(j)}$.
$u^{(i)},u^{(i+1)},u^{(i+2)}$ do not agree on any coordinate.
$u^{(i)},u^{(i+1)},u^{(j)}$ agree on at most one coordinate if $u^{(j)}\notin \{u^{(i)},u^{(i+1)}\}$.
$u^{(i)},u^{(j)},u^{(\ell)}$ agree on at most two coordinates if $u^{(i)}, u^{(j)}, u^{(\ell)}$ are distinct.
$u^{(i)}_{\geq 7}\neq u^{(i+j)}_{\geq 7}$ for $j\in [3].$
$u^{(i)}_{\geq 6}\neq u^{(i+j)}_{\geq 6}$ for $j\in [7]$.
Vectors $u^{(i)}_{\geq 4},u^{(i+j)}_{\geq 4}$ agree on at most three coordinates for $j\in [7]$.

These can be verified by hand, with property (e) being the most tedious. Below is the Mathematica code that verifies these properties.

U = {{1, 1, 1, 1, 1, 1, 1, 1},
     {0, 0, 0, 1, 0, 1, 0, 0},
     {1, 1, 1, 0, 0, 0, 0, 1},
     {0, 1, 0, 0, 1, 1, 1, 0},
     {1, 0, 0, 1, 0, 0, 1, 1},
     {0, 1, 1, 1, 1, 0, 0, 0},
     {1, 0, 0, 0, 1, 1, 0, 1},
     {0, 0, 1, 0, 0, 0, 1, 0}};
Agree[L_List] := Table[CountDistinct[L[[All, i]]] == 1, {i, 1, Length[L[[1]]]}];
CountAgree[L_List] := Count[Agree[L], True];

(* Prop (a) *)
Print[And @@ Table[CountAgree[U[[{i, Mod[i, 8] + 1}]]] <= 2, {i, 8}]];
(* Prop (b) *)
Print[And @@ Flatten[Table[(i == j) || CountAgree[U[[{i, j}]]] <= 4, {i, 1, 8}, {j, 1, 8}]]];
(* Prop (c) *)
Print[And @@ Table[CountAgree[U[[{i, Mod[i, 8] + 1, Mod[i + 1, 8] + 1}]]] ==  0, {i, 1, 8}]];
(* Prop (d) *)
Print[And @@ Flatten[Table[(i == j) || (j == Mod[i, 8] + 1) || 
      CountAgree[U[[{i, Mod[i, 8] + 1, j}]]] <= 1, {i, 1, 8}, {j, 1, 8}]]];
(* Prop (e) *)
Print[And @@ Flatten[Table[(i == j) || (i == l) || (j == l) || 
      CountAgree[U[[{i, j, l}]]] <= 2, {i, 1, 8}, {j, 1, 8}, {l, 1, 8}]]];
(* Prop (f) *)
Print[And @@ Flatten[Table[
     U[[i, {7, 8}]] != U[[Mod[i + j - 1, 8] + 1, {7, 8}]], {i, 1, 8}, {j, 1, 3}]]];
(* Prop (g) *)
Print[And @@ Flatten[Table[
     U[[i, {6, 7, 8}]] != U[[Mod[i + j - 1, 8] + 1, {6, 7, 8}]], {i, 1, 8}, {j, 1, 7}]]];
(* Prop (h) *)
Print[And @@ Flatten[Table[
     CountAgree[U[[{i, Mod[i + j - 1, 8] + 1}, {4, 5, 6, 7, 8}]]] <= 3, {i, 1, 8}, {j, 1, 7}]]];

As a bonus, here is a code that finds the most frequent subsequences in a given word (abbracadabra in this example):

(* Compute M(v,w) when the inputs are given as lists of symbols *) 
CountCopies[v_List, w_List] := CountCopies[v, w] = 
  Which[
    Length[v] == 0, 1,
    Length[w] == 0, 0,
    v[[1]] != w[[1]], CountCopies[v, Drop[w, 1]],
    True, CountCopies[v, Drop[w, 1]] + CountCopies[Drop[v, 1], Drop[w, 1]]];
(* Same but for strings *)
CountCopies[v_String, w_String] := 
  CountCopies[Characters[v], Characters[w]];
(* Report all subsequences of w, paired with their occurrence frequency, sorted *)
MaxCopies[w_String] := Module[{L = Characters[w]},
  Sort[Tally[(L[[#]] &) /@
    Subsets[Range[Length[L]]]], (#1[[2]] < #2[[2]] &)]];
				       
Print[MaxCopies["abbracadabra"][[{-1, -2, -3}]]]

Boris Bukh

Mathematical writings

Code used in the paper “Ranks of matrices with few distinct entries”