You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which is almost funny in how bad it is. It's not a big deal for FASTA/SAM inputs, because it has no long strings to match.
But we can do better, literal string matching is not a difficult algorithm.
The hard part is: What about when you read from an IO? Then the buffer could run out in the middle of the string, in which case the machine should be able to match e.g. "abra", then reload the buffer, then match "cadabra". I'm not sure how to handle that.
Putting that issue aside, I propose implementing this deeply, perhaps at every level of Automa, from the RE objects where catting string/char literals should result in just a literal string, to NFAs/DFAs, where literal strings should be a single edge, to Machines.
It's a larger undertaking but in a sense straightforward, and it could be fun.
The text was updated successfully, but these errors were encountered:
I don't have any idea what most of this means, but it sounds awesome :-P. My main question is: would this substantially improve speed, correctness, usability, or something else that justifies the effort? I mean, if it's fun, by all means do it even if it's just an intellectual exercise to improve "elegance" without other concrete benefit.
It would improve compilation speed and speed of generated code (for string literals), and also make compound regex, nfas dfas and machines have fewer states and so be easier to plot and debug
Automa produces bad code to match string literals.
The following machine:
compile(re"abracadabra")
produces this code:
Machine code
Which is almost funny in how bad it is. It's not a big deal for FASTA/SAM inputs, because it has no long strings to match.
But we can do better, literal string matching is not a difficult algorithm.
The hard part is: What about when you read from an IO? Then the buffer could run out in the middle of the string, in which case the machine should be able to match e.g. "abra", then reload the buffer, then match "cadabra". I'm not sure how to handle that.
Putting that issue aside, I propose implementing this deeply, perhaps at every level of Automa, from the RE objects where catting string/char literals should result in just a literal string, to NFAs/DFAs, where literal strings should be a single edge, to Machines.
It's a larger undertaking but in a sense straightforward, and it could be fun.
The text was updated successfully, but these errors were encountered: