Simulae3: A Testament to Socratic Design.
Labels: Design, Parsing, Sciatica, SimulaE, socratic method, software engineering, Vicodin
Labels: Design, Parsing, Sciatica, SimulaE, socratic method, software engineering, Vicodin
I was recently on holiday with my family visiting other members of my family as well as friends. It was at this time I pulled out my trusty MacBook Pro, fired up TextMate, pulled down the newest subversion repository of my simulation software 'SimulaE' and attempted to show my friend the crafty english parser component I wrote. I showed him the test suite with all of its various scenarios and then suggested he throw an attempt at it so that he may be amazed at its crafty logic.
He did, and it failed, and I was surprised to say the last. So given that I was on vacation, I wasn't going to focus must time on this issue other than updating the subversion repository so that I could look into the issue at a later time. Well, two days ago I finally did so, and found out after careful checking that one of the datafiles utilised for cross checking and sub classifications of parts of speech of english lacked the necessary word (also the culprit of the mis-parse). After making a quick addition to the aforementioned lookup file, the test ran just fine, and passed with flying colours.
Lesson learned (for what feels like the millionth time); check your support configuration and/or data files, because your code isn't broken, just doing what it is supposed to, based upon the information it has available (data files) to it.
Labels: Configuration, Lesson Learned, Macintosh, MBP, Parsing, SimulaE, Subversion, SVN, Testing, TextMate
As part of an ongoing project during which I have been designing, building and testing in one way or another over the past decade and a half, I have arrived as the parser phase. Well, I will correct that statement. I have tinkered with creating parsers before, but thanks to the expressive nature of the Python language, I was finally ready to make a serious attempt at writing an English present tense command based parser. I'm not going to make a massive post about this, though I am going to post the test results.
Note, all the tests pass. What a passing result actually means is this; The parsers job as of version 0.5.4 is to break apart the sentence(s) properly into their components via identification of verbs, conjunctions, prepositions, articles, conjunctions, pronouns and punctuation.
Creating Parser Instance: : Passed
Loading Configuration for Instance: : Passed
Testing for version: 0.5.0
paint the gold bucket black : Passed
get the big , heavy hammer and kill Bob with it ! : Passed
get hammer and squirrel from Bob and then hammer squirrel into the wall . : Passed
get the gold gold : Passed
kill elf and get gold : Passed
paint the bucket gold : Passed
paint the gold bucket black ! : Passed
get gold : Passed
kill elf , get gold : Passed
get the large gold brick . : Passed
paint the bucket gold . : Passed
get the large , gold brick . : Passed
Testing for version: 0.5.1
get rock , pliers , hammer and squirrel and hammer squirrel into the wall . : Passed
Testing for version: 0.5.2
kill the trite little elf with my sword , then wipe the blood off of it ! : Passed
destroy the cantankerous creature before you eat your dessert : Passed
kill the trite little elf with my sword , then wipe the blood off of my sword ! : Passed
kill the trite little elf with my sword . : Passed
Testing for version: 0.5.3
hammer the hammer into the big hammer : Passed
hammer the hammer into the hammer : Passed
Testing for version: 0.5.4
kill the trite little elf with my lavacious sword , then wipe the blood off it ! : Passed
go to the store and buy a new cellphone : Passed
slit Fred's throat and capture the warm , red blood in a cup ! : Passed
play with my toys and listen to my music . : Passed
play with my toys and listen to music . : Passed
As can be seen, the variety of possible inputs for the parser vary from simple to complex, from grammatically perfect to questionable fragments. Being that the purpose of this parse is first and foremost for use in a command environment in which interaction is needed, thus the present tense only requirement. This is a massive relief on the demands of the parser, but even still, it can be see from the above that the system can differentiate key words which can be used in both noun and adjective forms. The system also handle post adjective usage.
The system currently most notably recognises over 9,000 verbs (regular and irregular), 50 prepositions, and a whopping 46,000+ adjectives. A call for test case phrases is hereby announced. I am satisfied enough with the stage one parse process that I hereby am moving on to the second parse stage, that is the creation and order of individual statements (as dictated by their prepositions), in preparation for the third and final stage, in which the parser sends the results from stage two to the action engine. Both those phases will be the subjects of new posts, accordingly.