2017, Ritam Raha’s internship

The piecewise testable index of a word u, denoted h(u), is the
least integer n such that u is completely defined by listing its (scattered) subwords of length at most n. Here a subword is a subsequence of u, not a factor. Equivalently, h(u) is the least n such that u can be defined in the n-variable BΣ₁ fragment of the first-order logic over words. It was recently shown in [1] that being able to bound or estimate the index of words is an invaluable tool for measuring the complexity of piecewise-testable language.

During his internship, Ritam Raha worked on the problem of efficiently computing h(u). Polynomial-time (cubic) methods were already described in [1] but we managed to propose new algorithms running in O(|A|.|u|), where |A| is the size of the underlying alphabet. We also found unexpected connections between h(u) and m(u), a measure introduced by I. Simon in [2] and mostly forgotten since: m(u) is the least m such that u is m-reduced, i.e., such that removing any letter from yields a word u’ that has less subwords of length at most m. Since m(u) is easier to compute than h(u), the connections have algorithmic implications. An outcome of this research is that we could characterize the longest words on a 2-letter alphabet with given index, yielding exact bounds for h(u) as a function of the length of u. We could also improve the known bounds (from [1]) on larger alphabets. These results will eventually find their place in a paper.

[1] P. Karandikar and Ph. Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. 25th EACSL Conf. Computer Science Logic (CSL 2016), Marseille, France, 2016.

[2] I. Simon. Hierarchies of Event with Dot-Depth One. PhD thesis, University of Waterloo, Dept. Applied Analysis and Computer Science, Waterloo, ON, Canada, 1972.