via vimeo.com
Friday, July 10, 2009
Wednesday, July 08, 2009
Regular Expression Matching and Postfix notation
As the compiler scans the postfix expression, it maintains a stack of computed NFA fragments. Literals push new NFA fragments onto the stack, while operators pop fragments off the stack and then push a new fragment. For example, after compiling theabb
inabb.+.a.
, the stack contains NFA fragments fora
,b
, andb
. The compilation of the.
that follows pops the twob
NFA fragment from the stack and pushes an NFA fragment for the concatenationbb.
. Each NFA fragment is defined by its start state and its outgoing arrows:
via swtch.com
The snippet doesn't make much sense unless you read the article, but this part, I thought was rather neat. Usually, when I wrote my crappy, one-off parsers, I just used regexes to pull out the tokens that I needed. Never thought too much about how it was implemented. But what's detailed here makes sense. Regexes are just state machines where you track whether the string you're matching against lets you traverse all the way through the state machine. And to do that, it pushes each fragment of the regex onto a stack until it reaches an operator, which then pops it off and works on it. While I've usually left post-fix notation is ass-backwards from a user perspective, I can see the elegance of the implementation. I suspect Forth and Factor are similar in this regard.
Subscribe to:
Posts (Atom)