Securing web service by automatic robot detection
Abstract
Web sites are routinely visited by automated agents known as Web robots, that perform acts ranging from the beneficial, such as indexing for search engines, to the malicious, such as searching for vulnerabilities, attempting to crack passwords, or spamming bulletin boards. Previous work to identify malicious robots has relied on ad-hoc signature matching and has been performed on a per-site basis. As Web robots evolve and diversify, these techniques have not been scaling. We approach the problem as a special form of the Turing test and defend the system by inferring if the traffic source is human or robot. By extracting the implicit patterns of human Web browsing, we develop simple yet effective algorithms to detect human users. Our experiments with the CoDeeN content distribution network show that 95% of human users are detected within the first 57 requests, and 80% can be identified in only 20 requests, with a maximum false positive rate of 2.4%. In the time that this system has been deployed on CoDeeN, robot-related abuse complaints have dropped by a factor of 10.