05 July, 2007

Check External Data/Configurations First.

I was recently on holiday with my family visiting other members of my family as well as friends.  It was at this time I pulled out my trusty MacBook Pro, fired up TextMate, pulled down the newest subversion repository of my simulation software 'SimulaE' and attempted to show my friend the crafty english parser component I wrote.  I showed him the test suite with all of its various scenarios and then suggested he throw an attempt at it so that he may be amazed at its crafty logic.  


He did, and it failed, and I was surprised to say the last.  So given that I was on vacation, I wasn't going to focus must time on this issue other than updating the subversion repository so that I could look into the issue at a later time.  Well, two days ago I finally did so, and found out after careful checking that one of the datafiles utilised for cross checking and sub classifications of parts of speech of english lacked the necessary word (also the culprit of the mis-parse).  After making a quick addition to the aforementioned lookup file, the test ran just fine, and passed with flying colours.  


Lesson learned (for what feels like the millionth time);  check your support configuration and/or data files, because your code isn't broken, just doing what it is supposed to, based upon the information it has available (data files) to it.


Labels: , , , , , , , , ,

26 May, 2007

Present Tense English Parser : Part I

As part of an ongoing project during which I have been designing, building and testing in one way or another over the past decade and a half, I have arrived as the parser phase.  Well, I will correct that statement.  I have tinkered with creating parsers before, but thanks to the expressive nature of the Python language, I was finally ready to make a serious attempt at writing an English present tense command based parser.  I'm not going to make a massive post about this, though I am going to post the test results.


Note, all the tests pass.  What a passing result actually means is this; The parsers job as of version 0.5.4 is to break apart the sentence(s) properly into their components via identification of verbs, conjunctions, prepositions, articles, conjunctions, pronouns and punctuation.  


Creating Parser Instance:                                                                                  : Passed


Loading Configuration for Instance:                                                                 : Passed


Testing for version: 0.5.0 


  paint the gold bucket black                                                                             : Passed

  get the big , heavy hammer and kill Bob with it !                                           : Passed

  get hammer and squirrel from Bob and then hammer squirrel into the wall .  : Passed

  get the gold gold                                                                                             : Passed

  kill elf and get gold                                                                                         : Passed

  paint the bucket gold                                                                                       : Passed

  paint the gold bucket black !                                                                           : Passed

  get gold                                                                                                            : Passed

  kill elf , get gold                                                                                              : Passed

  get the large gold brick .                                                                                  : Passed

  paint the bucket gold .                                                                                     : Passed

  get the large , gold brick .                                                                                : Passed


Testing for version: 0.5.1 


  get rock , pliers , hammer and squirrel and hammer squirrel into the wall .     : Passed


Testing for version: 0.5.2 


  kill the trite little elf with my sword , then wipe the blood off of it !                : Passed

  destroy the cantankerous creature before you eat your dessert                        : Passed

  kill the trite little elf with my sword , then wipe the blood off of my sword !  : Passed

  kill the trite little elf with my sword .                                                               : Passed


Testing for version: 0.5.3 


  hammer the hammer into the big hammer                                                       : Passed

  hammer the hammer into the hammer                                                             : Passed


Testing for version: 0.5.4 


  kill the trite little elf with my lavacious sword , then wipe the blood off it !   : Passed

  go to the store and buy a new cellphone                                                        : Passed

  slit Fred's throat and capture the warm , red blood in a cup !                         : Passed

  play with my toys and listen to my music .                                                    : Passed

  play with my toys and listen to music .                                                          : Passed


As can be seen, the variety of possible inputs for the parser vary from simple to complex, from grammatically perfect to questionable fragments.  Being that the purpose of this parse is first and foremost for use in a command environment in which interaction is needed, thus the present tense only requirement.  This is a massive relief on the demands of the parser, but even still, it can be see from the above that the system can differentiate key words which can be used in both noun and adjective forms.  The system also handle post adjective usage.  


The system currently most notably recognises over 9,000 verbs (regular and irregular), 50 prepositions, and a whopping 46,000+ adjectives.  A call for test case phrases is hereby announced.  I am satisfied enough with the stage one parse process that I hereby am moving on to the second parse stage, that is the creation and order of individual statements (as dictated by their prepositions), in preparation for the third and final stage, in which the parser sends the results from stage two to the action engine.  Both those phases will be the subjects of new posts, accordingly.


Labels: , , , , ,