Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Assuming each state transition consumes a character (“epsilon-free NFA”), just follow the definition. Before each step, you have a list of NFA states you might be in, initially containing only the start state. During each step, go through each state on that list and enter any acceptable successors into a new one. Deduplicate and repeat.

This is (potentially super)linear in the number of states, but that doesn’t preclude it from being linear in the input length. In particular, even though it’s essentially a bad lazy implementation of standard NFA-to-DFA conversion, you’re avoiding the exponential state blowup you get in the eager version. (I say it’s a bad lazy implementation because a proper one would memoize; that, however, would bring the worst-case blowup back.) I think that, in addition to the fact that people do actually do this kind of thing in production implementations, qualifies it as a separate approach.

If your NFA does have epsilon-transitions—like, say, the NFAs obtained from the textbook Thompson regexp-to-NFA construction—you can either use a different (Glushkov) regexp-to-NFA construction[1], eliminate those transitions as a separate pass[2], or adjust the algorithm above to eliminate duplicate NFA states on the fly[3].

[1]: https://en.wikipedia.org/wiki/Glushkov%27s_construction_algo...

[2]: https://news.ycombinator.com/item?id=19272990

[3]: https://swtch.com/~rsc/regexp/regexp2.html



Last time I implemented this myself I just computed the epsilon closure at each step.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: