Items where author is affiliated with Tsinghua University
Number of items: 8.
and Hu, Jian
and Zhu, Yunzhang
and Li, Hua
and Chen, Zheng Competitive Analysis from Click-Through Log.
Existing keyword suggestion tools from various search engine companies could automatically suggest keywords related to the advertisers’ products or services, counting in simple statistics of the keywords, such as search volume, cost per click (CPC), etc. However, the nature of the generalized Second Price Auction suggests that better understanding the competitors’ keyword selection and bidding strategies better helps to win the auction, other than only relying on general search statistics. In this paper, we propose a novel keyword suggestion strategy, called Competitive Analysis, to explore the keyword based competition relationships among advertisers and eventually help advertisers to build campaigns with better performance. The experimental results demonstrate that the proposed Competitive Analysis can both help advertisers to promote their product selling and generate more revenue to the search engine companies.
and Tang, Jie
and Li, Juanzi
and Zhou, Lizhu Discovering the Staring People From Social Networks.
In this paper, we study a novel problem of staring people dis- covery from social networks, which is concerned with finding people who are not only authoritative but also sociable in the social network. We formalize this problem as an optimiza- tion programming problem. Taking the co-author network as a case study, we define three objective functions and pro- pose two methods to combine these objective functions. A genetic algorithm based method is further presented to solve this problem. Experimental results show that the proposed solution can effectively find the staring people from social networks.
and Li, Guoliang
and Li, Chen
and Feng, Jianhua Efficient Interactive Fuzzy Keyword Search.
Traditional information systems return answers after a user submits a complete query. Users often feel “left in the dark” when they have limited knowledge about the underlying data, and have to use a try-and-see approach for ﬁnding information. A recent trend of supporting autocomplete in these systems is a ﬁrst step towards solving this problem. In this paper, we study a new information-access paradigm, called “interactive, fuzzy search,” in which the system searches the underlying data “on the ﬂy” as the user types in query keywords. It extends autocomplete interfaces by (1) allowing keywords to appear in multiple attributes (in an arbitrary order) of the underlying data; and (2) ﬁnding relevant records that have keywords matching query keywords approximately. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incrementalsearch algorithms using previously computed and cached results in order to achieve an interactive speed. We have deployed several real prototypes using these techniques. One of them has been deployed to support interactive search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency. answers. This information-access paradigm requires the user to have certain knowledge about the structure and content of the underlying data repository. In the case where the user has limited knowledge about the data, often the user feels “left in the dark” when issuing queries, and has to use a tryand-see approach for ﬁnding information, as illustrated by the following example. At a conference venue, an attendee named John met a person from a university. After the conference he wanted to get more information about this person, such as his research projects. All John knows about the person is that he is a professor from that university, and he only remembers the name roughly. In order to search for this person, John goes to the directory page of the university. Figure 1 shows such an interface. John needs to ﬁll in the form by providing information for multiple attributes, such as name, phone, department, and title. Given his limited information about the person, especially since he does not know the exact spelling of the person’s name, John needs to try a few possible keywords, go through the returned results, modify the keywords, and reissue a new query. He needs to repeat this step multiple times to ﬁnd the person, if lucky enough. This search interface is neither efficient nor user friendly.
and Cai, Rui
and Wang, Yida
and Zhu, Jun
and Zhang, Lei
and Ma, Wei-Ying Incorporating Site-Level Knowledge to Extract Structured Data from Web Forums.
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to both complex page layout designs and unrestricted user created posts. In this paper, we study the problem of structured data extraction from various web forum sites. Our target is to ﬁnd a solution as general as possible to extract structured data, such as post title, post author, post time, and post content from any forum site. In contrast to most existing information extraction methods, which only leverage the knowledge inside an individual page, we incorporate both page-level and site-level knowledge and employ Markov logic networks (MLNs) to effectively integrate all useful evidence by learning their importance automatically. Site-level knowledge includes (1) the linkages among different object pages, such as list pages and post pages, and (2) the interrelationships of pages belonging to the same object. The experimental results on 20 forums show a very encouraging information extraction performance, and demonstrate the ability of the proposed approach on various forums. We also show that the performance is limited if only page-level knowledge is used, while when incorporating the site-level knowledge both precision and recall can be signiﬁcantly improved.
and Feng, Jianhua
and Zhou, Lizhu Interactive Search in XML Data.
In a traditional keyword-search system in XML data, a user composes a keyword query, submits it to the system, and retrieves relevant subtrees. In the case where the user has limited knowledge about the data, often the user feels “left in the dark” when issuing queries, and has to use a tryand-see approach for ﬁnding information. In this paper, we study a new information-access paradigm for XML data, called “Inks,” in which the system searches on the underlying data “on the ﬂy” as the user types in query keywords. Inks extends existing XML keyword search methods by interactively answering keyword queries. We propose effective indices, early-termination techniques, and efficient search algorithms to achieve a high interactive speed. We have implemented our algorithm. The experimental results show that Inks achieves high search efficiency and result quality.
and Nie, Zaiqing
and Liu, Xiaojiang
and Zhang, Bo
and Wen, Ji-Rong StatSnowball: a Statistical Approach to Extracting Entity Relationships.
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Boot- strapping systems significantly reduce the number of train- ing examples, but they usually apply heuristic-based meth- ods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Further- more, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify var- ious types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a boot- strapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic net- works (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1 -norm penalized maximum like- lihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during it- erations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.
and Wang, Lu
and Guo, Xiaolin
and Pan, Aimin
and Zhu, Bin B. WPBench: A Benchmark for Evaluating the Client-side Performance of Web 2.0 Applications.
In this paper, a benchmark called WPBench is reported to evaluate the responsiveness of Web browsers for modern Web 2.0 applications. In WPBench, variations of servers and networks are removed and the benchmark result is the closest to what Web users would perceive. To achieve these, WPBench records users’ interactions with typical Web 2.0 applications, and then replays Web navigations when benchmarking browsers. The replay mechanism can emulate the actual user interactions and the characteristics of the servers and the networks in a consistent way independent of browsers so that any browser compliant to the standards can be benchmarked fairly. In addition to describing the design and generation of WPBench, we also report the WPBench comparison results on the responsiveness performance for three popular Web browsers: Internet Explorer, Firefox and Chrome.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]