Performance Evaluation of HITS-based Algorithms

L. Li, Y. Shang, H. Shi, and W. Zhang (USA)

Keywords

Statistical metric, HITS-based algorithms, relevance scoring methods, information retrieval

Abstract

In the paper, we propose a general method to statistically compare the performance of HITS-based algorithms that combine four relevance scoring methods, CDR, TLS, VSM, and Okapi, using a set of broad topic queries. The method incorporates various statistical metrics and automatically selects an appropriate statistical metric according to the problem parameters. Empirically, We test five representative statistical metrics under different conditions through simulation. They are expected loss, Friedman statistic, probability of win, interval-based selection, and probably approximately correct. In the experiments, expected loss is the best for small means, like 1 or 2, and prob ably approximately correct is the best for all the other cases. Among the four relevance scoring methods, CDR is the best statistically when it is combined with a HITS-based algorithm.

Important Links:



Go Back