WWW2009 EPrintsQuery GeoParser: A Spatial-Keyword Query Parser Using Regular ExpressionsJasonHinesauthorTonyAbou-AssalehauthorThere has been a growing commercial interest in local information within Geographic Information Retrieval, or GIR, systems. Local search engines enable the user to search for entities that contain both textual and spatial information, such as Web pages containing addresses or a business directory. Thus, queries to these systems may contain both spatial and textual components—spatial-keyword queries. Parsing the queries requires breaking the query into textual keywords, and identifying components of the geo-spatial description. For example, the query ‘Hotels near 1567 Argyle St, Halifax, NS’ could be parsed as having the keyword ‘Hotels’, the preposition ‘near’, the street number ‘1567’, the street name ‘Argyle’, the street suffix ‘St’, the city ‘Halifax’, and the province ‘NS’. Developing an accurate query parser is essential to providing relevant search results. Such a query parser can also be utilized in extracting geographic information from Web pages. One approach to developing such a parser is to use regular expressions. Our Query GeoParser is a simple, but powerful, regular expression-based spatial-keyword query parser. Query GeoParser is implemented in Perl and utilizes many of Perl’s capabilities in optimizing regular expressions. By starting with regular expression building blocks for common entities such as number and streets, and combining them into larger regular expressions, we are able handle over 400 different cases while keeping the code manageable and easy to maintain. We employ the mark-and-match technique to improve the parsing efficiency. First we mark numbers, city names, and states. Following, we use matching to extract keywords and geographic entities. The advantages of our approach include manageability, performance, and easy exception handling. Drawbacks include a lack of geographic hierarchy and the inherent difficulty in dealing with misspellings. We comment on our overall experience using such a parser in a production environment, what we have learnt, and suggest possible ways to deal with the drawbacks.2009-04Conference or Workshop Item

For work being deposited by its own author: In self-archiving this collection of files and associated bibliographic metadata, I grant WWW2009 EPrints the right to store them and to make them permanently available publicly for free on-line. I declare that this material is my own intellectual property and I understand that WWW2009 EPrints does not assume any responsibility if there is any breach of copyright in distributing these files or metadata. (All authors are urged to prominently assert their copyright on the title page of their work.)

For work being deposited by someone other than its author: I hereby declare that the collection of files and associated bibliographic metadata that I am archiving at WWW2009 EPrints) is in the public domain. If this is not the case, I accept full responsibility for any breach of copyright that distributing these files or metadata may entail.

Clicking on the deposit button indicates your agreement to these terms.