Although network reconnaissance through scanning has been well explored in the literature, new scan detection proposals with various detection features and capabilities continue to appear. To our knowledge, however, there is little discussion of reliable methodologies to evaluate network scanning detectors. In this paper, we show that establishing ground truth labels of scanning activity on non-synthetic network traces is a more difficult problem relative to labeling conventional intrusions. The main problem stems from lack of absolute ground truth (AGT). We identify the specific types of errors this admits. For real-world network traffic, typically many events can be equally interpreted as legitimate or intrusions, and therefore, establishing AGT is infeasible since it depends on unknowable intent. We explore how an estimated ground truth based on discrete classification criteria can be misleading since typical detection accuracy measures are strongly dependent on the chosen criteria. We also present a methodology for evaluating and comparing scan detection algorithms. The methodology classifies remote addresses based on continuous scores designed to provide a more accurate reference for evaluation. The challenge of conducting a reliable evaluation in the absence of AGT applies to other areas in network intrusion detection, and corresponding requirements and guidelines apply.

Absolute ground truth, Evaluation, Ground truth reference, Scan detection
