
Succinct Indices for Range Queries with applications to Orthogonal Range Maxima
Rajeev Raman
We consider the problem of preprocessing $N$ points in 2D, each endowed with a priority, to answer the following queries: given a axisparallel rectangle, determine the point with the largest priority in the rectangle. Using the ideas of the \emph{effective entropy} of range maxima queries and \emph{succinct indices} for range maxima queries, we obtain a structure that uses $O(N)$ words and answers the above query in $O(\log N \log \log N)$ time. This a direct improvement of Chazelle’s result from 1985 \cite{Chazelle88} for this problem — Chazelle required $O(N/\epsilon)$ words to answer queries in $O((\log N)^{1+\epsilon})$ time for any constant $\epsilon > 0$.[Joint work with Arash Farzan and Ian Munro.]

A Faster GrammarBased SelfIndex
Travis Gagie
To store and search genomic databases efficiently, researchers have recently started building compressed selfindexes based on straightline programs and LZ77. In this paper we show how, given a straightline program with $r$ rules for a string \(S [1..n]\) whose LZ77 parse consists of $z$ phrases, we can store a selfindex for $S$ in $\Oh{r + z \log \log n}$ space such that, given a pattern \(P [1..m]\), we can list the $\occ$ occurrences of $P$ in $S$ in $\Oh{m^2 + \occ \log \log n}$ time. If the straightline program is balanced and we accept a small probability of building a faulty index, then we can reduce the $\Oh{m^2}$ term to $\Oh{m \log m}$. All previous selfindexes are either larger or slower in the worst case.[Joint work with , Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich and Simon J. Puglisi.]
 TimeSpace TradeOffs for Longest Common Extensions
Benjamin Sach
We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common prefix of the suffixes of $T$ starting at positions $i$ and $j$. We study the timespace tradeoffs for the problem, that is, the space used for the data structure vs. the worstcase time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $\tau$, \leq \tau \leq n$, we show how to achieve either $O(\infrac{n}{\sqrt{\tau}})$ space and $O(\tau)$ query time, or $O(\infrac{n}{\tau})$ space and $O(\tau \log({\LCE(i,j)}/{\tau}))$ query time, where $\LCE(i,j)$ denotes the length of the LCE returned by the query. These bounds provide the first smooth tradeoffs for the LCE problem and almost match the previously known bounds at the extremes when $\tau=1$ or $\tau=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. Finally, we also present an efficient technique to reduce LCE queries on two strings to one string.
 Regular Expression Matching: History, Status, and Challenges
Philip Bille
In this talk I’ll give a concise survey of the classical regular expression matching problem. I’ll discuss the main algorithmic techniques for worstcase efficient solutions and present a selection of open problems.
 CacheOblivious Implicit Predecessor Dictionaries with the WorkingSet Property
Casper KejlbergRasmussen
In this paper we present an implicit dynamic dictionary with the workingset property, supporting insert(e) and delete(e) in O(log n) time, predecessor(e) in O(log l_{p(e)}) time, successor(e) in O(log l_{s(e)}) time and search(e) in O(log min(l_{p(e)},l_{e}, l_{s(e)})) time, where n is the number of elements stored in the dictionary, l_{e} is the number of distinct elements searched for since element e was last searched for and p(e) and s(e) are the predecessor and successor of e, respectively. The timebounds are all worstcase. The dictionary stores the elements in an array of size n using no additional space. In the cacheoblivious model the log is base B and the cacheobliviousness is due to our black box use of an existing cacheoblivious implicit dictionary. This is the first implicit dictionary supporting predecessor and successor searches in the workingset bound. Previous implicit structures required O(log n) time.
 Tree Compression with Top Trees
Oren Weimann
I will describe a new compression scheme for labeled trees based on top trees. Our approach is to replace the input tree with another tree that is in essence equivalent but is more compressible. The idea of transforming the input into a more compressable one is analogous to the BurrowsWheeler transform for strings.
Our new compression scheme is the first to take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation.
We show that the new compression scheme achieves close to optimal worstcase compression, can compress exponentially better than DAG compression, is never much worst than DAG compression, and supports navigational queries in logarithmic time.
[Joint work with Philip Bille, Inge Li Gortz, and Gad M. Landau.]
 Partial Persistence for Dynamic Planar Range Maxima Reporting
Kostas Tsakalidis
In this talk I will present a generic technique to obtain fully dynamic data structures
that support deletions in $O(insertion)$ complexity. In particular, I will show how to
augment a dynamic search tree by use of partially persistent secondary structures.
A persistent data structure is a dynamic data structure that remembers its versions
as update operations are performed to it. It is called partially persistent, when only
the latest version is updatable, while the previous versions can only be queried.In [ICALP’11] we apply the technique to the problem of maintaining the maximal points (points that are not dominated by any other point) of a planar pointset, under insertions and deletions of points, and obtain improved deletion time over previous efforts for the pointer machine and the RAM models.
[Joint work with Gerth Stølting Brodal]
 Local Exact Pattern Matching for Nonfixed RNA Structures
Mika Amit
Detecting local common sequencestructure regions of RNAs is a biologically meaningful problem. By detecting such regions, biologists are able to identify functional similarity between the inspected molecules.
We developed dynamic programming algorithms for finding common structuresequence patterns between two RNAs.
The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work which matches fixed structures, we support the \textit{arc breaking} edit operation; this allows to match only a subset of the given base pairs.
We present an $O(n^3)$ algorithm for local exact pattern matching between two nested RNAs, and an $O(n^3\log n)$ algorithm for one nested RNA and one boundedunlimited RNA. In addition, an algorithm for approximate pattern matching is introduced that for two given nested RNAs and a number $k$, finds the maximal local pattern matching score between the two RNAs with at most $k$ mismatches in $O(n^3k^2)$ time.
Finally, we present an $O(n^3)$ algorithm for finding the most similar subforest between two nested RNAs.
[Joint work with Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Möhl, Christina Schmiedl, Sebastian Will]
 Detecting Approximate Periodic Patterns in SubCubic Time
Avivit Levy
Given $\epsilon\in[0, 1)$, the {\em Relative Error Periodic Pattern
Problem (REPP)} is the following:\\
{\sl INPUT:} An $n$long sequence $S$ of numbers $s_i\in \mathbb{N}$
in increasing order.\\
{\sl OUTPUT:} The longest $\epsilon$relative error periodic pattern,
i.e.,\ the longest subsequence $s_{i_1}, s_{i_2},\ldots, s_{i_k}$ of
$S$, for which there exists a number $p$ such that the absolute
difference between any two consecutive numbers in the subsequence is
at least $p$ and at most $p(1 + \epsilon)$.\\
The best known algorithm for this problem has $O(n^3)$ time
complexity. This bound is too high for large inputs in practice. In
this paper we give a new algorithm for finding the longest relative
error periodic pattern (the REPP problem). Our algorithm is based on a
new method that considers another parameter of the input
sequence. Though our method is not guaranteed to work in subcubic
time for all inputs, we prove its \emph{worst case} guarantee using
this new parameter of the input. This enables us to prove that our
algorithm works in subcubic time on inputs for which the best known
algorithm works in $O(n^3)$ time. In fact, in many practical
situations our algorithm works in $\tilde{O}(n^2)$ time.
 RNA Tree Comparisons Via Unrooted Unordered Mappings
Nimrod Milo
RNA secondary structures are often represented as trees, motivating the application of tree comparisonbased computational tools for the detection of structural similarities and common motifs. We generalize some current approaches for the alignments between RNA trees to also consider unordered unrooted mappings. This is motivated by the drive to extend RNA structural comparisons to accommodate additional possible evolutionary phenomena (e.g. segment insertions, translocations and reversals) as well as other new variations in structural conservation which were not considered before.
The problem of unrooted unordered tree edit distance is known to be MAXSNP hard. Hence, we focus here on a constrained tree comparison problem, based on a homeomorphic tree mapping. The new problem variant is a generalization of several previous approaches, extending them to support both the unrooting of the trees in the alignment and the unordering of their branch mappings. This problem variant is fully symmetric, allowing deletions from both of the aligned trees and thus eliminating the “textpattern” restriction which was required in a previous related work. We propose a new algorithm for this problem, which applies to several modes, including global or local alignment of trees and supporting both ordered and unordered, rooted and unrooted subtree mappings. Our new algorithm maintains the cubic time complexity of the previous, less general variants of the problem. Specifically, our time bound is $O(n_T n_S + L_T L_S \min(d_T, d_S))$, where $T$ and $S$ are the input trees and $n_T$, $L_T$ and $d_T$ are the number of nodes, the number of leaves and the maximal degree of $T$, respectively.
We implemented the algorithm as a graphical software tool (webinterface available in http://www.cs.bgu.ac.il/~negevcb/FRUUT), which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran allagainstall pairwise alignments of RNAse P RNA family members. Our results demonstrate that applying unrooted unordered alignment to RNA trees can be used to expose structural similarity between some members in the family more thoroughly than the traditional rooted ordered alignment approaches.
[Joint work with Shay Zakov, Erez Katzenelson, Eitan Bachmat , Yefim Dinitz and Michal ZivUkelson.]
 Gene bitargeting by viral and human miRNAs
Isana VekslerLublinsky
MicroRNAs (miRNAs) are an abundant class of small noncoding RNAs that can affect gene expression by posttranscriptional regulation of mRNAs. They play important roles in several biological processes (e.g., development and cell cycle regulation). Some viral organisms also encode miRNAs, a fact that contributes to the complex interactions between viruses and their hosts. A need arises to understand the functional relationship between viral and host miRNAs and their effect on viral and host genes. Our approach to meet this challenge is to identify modules where viral and host miRNAs cooperatively regulate host gene expression. We present a method to identify groups of viral and host miRNAs that cooperate in posttranscriptional gene regulation, and their target genes that are involved in similar biological processes. We call these groups (genes and miRNAs of human and viral origin) – modules. The modules are found in a new twostage procedure, which we call bitargeting, and is presented in this talk. The stages are (i) a new and efficient target prediction, and (ii) a new method for clustering objects of three different data types. The presented algorithm, which makes use of diverse biological data, is demonstrated to be an efficient approach for finding bitargeting modules of viral and human miRNAs. These modules can contribute to a better understanding of viralhost interactions and the role that miRNAs play in them.
[Joint work with Yonat ShemerAvni, Klara Kedem and Michal ZivUkelson]
 Maximalexponent Repeats
Maxime Crochemore
The exponent of a string is the quotient of its length over its
smallest period. The exponent and the period of a string can be computed
in time proportional to its length.
We design an algorithm to compute the maximal exponent of factors of an
overlapfree string. It runs in lineartime on a fixedsize alphabet, while a
naive solution of the question would run in cubic time.
The solution for non overlapfree strings derives from algorithms to compute
all maximal repetitions, also called runs, occurring in the string.
We show there is a linear number of maximalexponent repeats in an overlapfree
string. The algorithm can locate them all in linear time.
 Exact Pattern Matching for RNA Structure Ensembles
Sebastian Will
ExpaRNA’s core algorithm computes, for two fixed RNA structures,
a maximal nonoverlapping set of maximal exact matchings. We
introduce an algorithm ExpaRNAP that solves the lifted problem of
finding such sets of exact matchings in entire Boltzmanndistributed
structure ensembles of two RNAs. Due to a novel kind of structural
sparsification, the new algorithm maintains the time and space
complexity of the algorithm for fixed input structures. Furthermore,
we generalized the chaining algorithm of ExpaRNA in order to
compute a compatible subset of ExpaRNAP’s exact matchings. We
show that ExpaRNAP outperforms ExpaRNA in
Bralibase 2.1 benchmarks, where we pass the chained exact
matchings as anchor constraints to the RNA alignment tool
LocARNA. Compared to LocARNA, this novel approach shows
similar accuracy but is six times faster.
[Joint work with Mika Amit, Rolf Backofen, Steffen Heyne, Gad M.
Landau, Mathias Möhl, Christina Schmiedl]
 Recent Progress on The MultiState Perfect Phylogeny Problem
Dan Gusfield
The MultiState Perfect Phylogeny Problem is a natural extension
of the Binary Perfect Phylogeny (or Compatibility) Problem, allowing
evolutionary characters to take on more than two states. The basic
problem is to determine whether multistate data can be derived on
a multistate perfect phylogeny. In population genetics terminology,
the binary problem is to determine if data fits the “infinitesites”
model, while the multistate problem is to determine if data fits
the “infinitealleles” model. The problem is of interest due to
prior elegant mathematical and algorithmic results,
and due to nonbinary, but integer, data that is increasingly becoming
available.In 1975, Buneman showed a how to view the multistate perfect
phylogeny problem as a problem of triangulating nonchordal graphs,
but that result has not been widely exploited as a computational tool,
despite a robust and continuing literature on the problem of triangulating
nonchordal graphs. In this talk, I discuss our recent work on
exploiting the chordal graph approach (and briefly, two other approaches)
to study multistate perfect
phylogeny problems. I will talk about several sets of results.The first set of results concerns problems that extend the utility
of the multistate perfect phylogeny model: The Missing Data (MD)
Problem, where some entries in the input are missing and the question
is whether (bounded) values for the missing data can be imputed so
that the resulting data has a multistate perfect phylogeny; The
CharacterRemoval (CR) Problem, where we want to minimize the
number of characters to remove from the data so that the resulting
data has a multistate perfect phylogeny. Using results on minimal
triangulation of nonchordal graphs, we establish new necessary and
sufficient conditions for the existence of a perfect phylogeny with
(or without) missing data. This leads to solutions of the MD and
CR problems using integer linear programming.The second set of results concerns generalization of the famous
“fourgametes test” that characterizes the compatibility of binary
data. Such a generalization was sought, but not found, in the 1970’s
and 1980’s, and has been open since then. The generalization
establishes that 3state data has a perfect phylogeny if and only
if every subset of three characters has a 3state perfect phylogeny.
We also characterize the patterns in the data that prohibit a 3state
perfect phylogeny. In 3states, there are four obstructing patterns,
rather than just one (four gametes) in the case of two states. We mention
recent progress (by Fumei Lam) on generalizing this result to an arbitrary
number of states.The third set of results concerns reductions of the multistate problem
to established techniques. We demonstrate that the 3state problem
reduces to the wellknown and efficiently solvable 2SAT problem. This builds
on the efficient solution to the 3state problem due to Dress and Steel. We next
demonstrate that the general multistate problem reduces to the binary problem
with missing data, a problem that was discussed in a prior Newton Phylogeny meeting.
Then, using tools developed for the binary case with missing data, we see how this
approach solves the MD and CR problems. Finally, we show how to speed up
triangulation methods by applying specification ideas.
[Joint work with Kristian Stevens, R. Gysel, Y. Wu, F. Lam, S. Sridhar.]
 Local Search for String Problems: Brute Force is Essentially Optimal
Danny Hermelin
Local search is a universal algorithmic approach for coping with
computationally hard optimization problems. When analyzing heuristics of
this type, one is interested in the amount of time required to compute a
socalled “improvement step”, and the number of such improvement steps
that are required in order to solve the problem at hand. In this talk I
will examine the first question when applied to classical NPhard string
problems. I will show that for the Closest String problem in a natural
local search formulation, one cannot compute an improvement step in
n^{o(k)} time (assuming standard complexitytheoretic assumptions), while
simple bruteforce allows this in n^{O(k)} time. Similar results hold also
for Longest Common Supersequence, Shortest Common Superstring, and
Shortest Common Supersequence.
[Joint work with Jiong Guo and Christian Komusiewicz.]
 Online Sorted Range Reporting
Alex LopezOrtiz
[Joint work with Gerth Brodal and Rolf Fagerberg.]
 Succinct Posets
Ian Munro
We describe an algorithm for compressing a partially ordered set, or poset, so that it occupies space matching the information theory lower bound (to within lower order terms), in the worst case. Using this algorithm, we design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an $n$ element poset, the data structure occupies $n^2/4 + o(n^2)$ bits, in the worst case, which is roughly half the space occupied by an upper triangular matrix. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using roughly a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time.
[Joint work with Pat Nicholson.]
 Pattern Matching in Multiple Streams
Raphael Clifford
I will introduce the problem of pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated
with one of s streaming texts. The task at each time step is to report
if there is a new match between a fixed pattern of length m and a
newly updated stream. As is usual in the streaming context, the goal
is to use as little space as possible while still reporting matches
quickly. We give almost matching upper and lower space bounds for
three distinct pattern matching problems. For exact matching we show
that the problem can be solved in constant time per arriving symbol
and O(m+s) words of space. For the kmismatch and kdifference
problems we give O(k) time solutions that require O(m+ks) words of
space. In all three cases we also give space lower bounds which show
our methods are optimal up to a single logarithmic factor. Finally I
will set out a number of open problems related to this new model for
pattern matching.
[Joint work with Markus Jalsenius, Ely Porat and Benjamin Sach.]
 Fast ArcAnnotated Subsequence Matching in Linear Space
Inge Li Gørtz
An arcannotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arcannotated strings P and Q the arcpreserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arcannotated strings where the arcs are “nested” are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arcpreserving subsequence problem for nested arcannotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. [ACM Trans. Algorithms 2006] gave an algorithm for this problem using O(nm) time and space, where m and n are the lengths of P and Q, respectively. In this paper we present a new algorithm using O(nm) time and O(n + m) space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is a likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arcannotated strings.
[Joint work with Philip Bille.]
 Two Dimensional Range Minimum Queries and Fibonacci Lattices
Moshe Lewenstein
Given a matrix of size N, two dimensional range minimum
queries (2DRMQs) ask for the position of the minimum element in a
rectangular range within the matrix. We study tradeoff s between the
query time and the additional space of data structures supporting
2DRMQs. As a novel technique, we use the discrepancy properties of
Fibonacci lattices to achieve an indexing data structure of size O(N=c
+ c(log c)O(1)) bits additional space with O(c log c(log log c)^2)
query time. Also, when the entries of an input matrix are from a fixed
size alphabet, we show that the query time can be improved to O(c log
c log log c).
[Joint work with Gerth Stølting Brodal, Pooya Davoodi, Rajeev Raman and Srinivasa Rao.]
 Efficient Computation of Sparse Transforms
Oren Kapah
Efficient handling of sparse data is a key challenge in Computer Science.
Binary convolutions, such as the Fast Fourier Transform or the Walsh
Transform are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of
sparse binary convolutions. Both randomized and deterministic
algorithms were developed for efficiently computing the sparse FFT.The key operation in all these algorithms was length reduction. The
sparse data is mapped into small vectors that preserve the convolution
result. The reduction method used todate was the modulo function
since it preserves location (of the “1” bits) up to cyclic shift.
To date there is no known efficient algorithm for computing the sparse
Walsh Transform. Since the modulo function does not preserve the Walsh
transform a new method for length reduction is needed.In this paper we present such a new method – polynomials.
This method enables the development of an efficient algorithm for
computing the binary sparse Walsh Transform. To our knowledge, this is
the first such algorithm. We also show that this method allows a
faster deterministic computation of the sparse FFT than currently
known in the literature.
[Joint work with A. Amir, E. Porat and A. Rothschild.]
 Binary Search Trees for Compression
John Iacono
Suppose you are receiving a string one character at a time from an
alphabet of size n and wish to output a compressed representation of
it. One method is to search for each character in a binary search tree
and output the search path to the character. This ancient method has
several benefits: it uses a number of words of space linear in the
alphabet size and compresses in time linear in the number of bits of
the output string. What are the types of patterns that will compress
well in such a scheme? What is the best choice of binary search tree?
What patterns will not compress well and are better served through
other schemes?
 Configurations and Minority in the String Consensus Problem
Haim Paryenty
The Closest String Problem is defined as follows. Let $S$ be
a set of $k$ strings $\{s_1,\dots s_k\}$, each of length
$\ell$, find a string $\hat{s}$, such that the maximum Hamming
distance of $\hat{s}$ from each of the strings is minimized.
We denote this distance with $d$. The string $\hat{s}$ is called
a consensus string. In this paper we present two main algorithms,
the {\tt Configuration} algorithm with $O(k^2 \ell ^ k)$ running time for this problem,
and the {\tt Minority} algorithm.Despite the great effort to obtain
efficient algorithms for this problem an algorithm with the natural
running time of $O(\ell ^ k)$ was not known. In this paper we close
this gap.Our result means that algorithms solving the closest string problem in
times $O(\ell^2), O(\ell^3), O(\ell^4)$ and $O(\ell^5)$ exist for the
cases of $k=2,3,4$ and $5$, respectively. It is known that, in fact,
the cases of $k=2,3,$ and $4$ can be solved in linear time. No
efficient algorithm is currently known for the case of $k=5$. We prove
two lemmas, the {\em unit square lemma} and the {\em minority lemma}
that exploit surprising properties of the closest string problem and enable
constructing the closest string in a sequential fashion.
These lemmas with some additional ideas give a $O(\ell^2)$ algorithm for computing a closest string
of $5$ binary strings.
[Joint work with A. Amir and L. Roditty.]