Perimeter Intrusion Detection by Video Surveillance: A Survey

09 Oct.,2023

 

Given a video or set of videos, the PIDS detects intrusions. To evaluate the performance, we need to compare the PIDS output with ground truth annotations. The manner in which this evaluation is carried out impacts the final score metric. The following subsections present different evaluation protocols.

In this type of evaluation, each frame contributes equally to the overall score. Thus, it can provide the same overall score for an algorithm that gives us multiple omissions of intrusion events versus an algorithm that gives us an omission of some intrusion frames in multiple intrusion events. This is an undesirable evaluation in the case of intrusion detection because we cannot afford to have omissions of intrusion events. In reality, we are more interested in knowing if the system is able to classify the intrusion events correctly as a whole. This demands an event-level evaluation. In other words, we want to detect all intrusion intervals ( II s) from the video. More specifically, we are interested in evaluating whether the beginning of these intrusion intervals are detected correctly. This is because, if an intrusion event is detected too late, then that detection is not very useful. The idea is to detect each intrusion interval as soon as it occurs and, thus, we need an evaluation scheme that takes this into account.

Given ground truthand prediction(see Equation ( 5 )), the frame-level intrusion evaluation is simply the binary classification evaluation of each frame of the video [ 76 ]. We can calculate elements of the confusion matrix, i.e., the true positive (TP), false negative (FN), false positive (FP) and true negative (TN), with intrusion as the objective class [ 6 ]. We can then evaluate the performance of the PIDS depending on the choice of metric, such as the precision, recall,score, etc., as defined in Section 5.4

As the name suggests, in this type of evaluation, we are interested in checking whether each frame of the video is correctly classified as intrusion/normal or not. For a videoand given intrusion parameters (), the frame-level ground truth is defined as:whereis the ground truth label for each frameof the videoat time; value 1 denotes an intrusion class, and 0 otherwise.

Apart from these, one rule is i-LIDS-dataset-specific: alls and alarms that start within 5 min from the beginning of the video are ignored. This means that they wanted to give a preparation time to the system. This evaluation scheme is not generic and has several drawbacks, as illustrated in Figure 7 . It penalizes an alarm as an FP after 10 s from the beginning of anwithout taking into account the duration of intrusion. If thehas a long duration (such as an hour) and we have an alarm at the 11th second, it is not ideal to mark it as an FP. From a practical point of view, the surveillance personnel will receive a mini-clip as soon as the alarm is triggered and, if the intrusion is present, then it is not sensible to mark this as an FP. Instead, this alarm should be ignored as it is not detected within 10 s. Similarly, each alarm after 10 s but withinis considered as an FP, and this strongly penalizes the system precision. Instead, these extra alarms should be counted without assigning them as an FP.

FP: if there is an alarm but not within 10 s from the beginning of the II . If there are consecutive FPs within a 5-s gap among them, only the first one is considered and the rest are ignored.

TP: if there is at least one alarm within 10 s from the beginning of the II . If there are multiple alarms candidates, the first one is taken and the rest are ignored.

For evaluating on the i-LIDS dataset, their user guide provides an evaluation procedure [ 26 ]. It focus on evaluating intrusion at event-level rather than frame-level. To be precise, an intrusion is considered correctly detected if there is at least one system alarm within 10 s from the start of the intrusion event. For anof the video and alarms, the rules of the i-LIDS evaluation protocol are as follows:

5.3. Edge-Level Evaluation

To appropriately evaluate a PIDS while considering the real-world aspects, we propose a new evaluation protocol. An intrusion event begins with a transition from a non-intrusion to intrusion state, i.e., we have a rising edge as shown in Figure 8 . Similarly, an intrusion event stops by a reverse transition, i.e., a falling edge. We are interested in detecting intrusion within a few frames from the rising edge. Since we focus on this rising edge, we call this the edge-level evaluation. In other words, we emphasize detecting the beginning of intrusion intervals. We first define the following terms from an intrusion interval of the video (see Figure 8 ).

n pre frames before and n post frames after the II :

IN ( II , n pre , n post ) = t pre , t post s . t . t pre = t start ( II ) − n pre and t post = t end ( II ) + n post .

The intrusion interval neighborhood IN is an expanded interval defined byframes before andframes after the

These n pre and n post frames are in the range of one to five (less than 1/5 s for a video at 25 FPS) and are added in order to take into account the error of annotation. This error is due to the fact that it is difficult to mark the exact frame at which the intrusion starts or ends. This tolerance further permits not strictly penalizing the system when an intrusion event is detected a few frames before the actual event or when the system detects a few more intrusion frames after the actual event is finished. These cases arise often when the intrusion object is in the scene but not inside the surface to protect. Therefore, IN is an interval where the actual intrusion activity takes place, and an alarm given by a PIDS in this interval can be counted as either TP or ignored. An alarm given outside IN must be a false alarm and should be counted as an FP.

n pre frames before and

n

frames from the beginning of II :

IBN ( II , n pre , n ) = t pre , t n s . t . t pre = t start ( II ) − n pre and t n = t start ( II ) + n .

The intrusion beginning neighborhood IBN is an interval comprisingframes before andframes from the beginning of

This interval signifies the importance of the initial frames of an II , where an intruder has just entered the protected area, and it is in this interval where we ideally want the PIDS to raise an alarm. An alarm raised in IBN must be a TP.

II and alarms A ( P ) , the possible outcomes at edge-level are defined as (see

For anand alarms, the possible outcomes at edge-level are defined as (see Figure 8 ):

1.

TP: if there is at least one alarm in IBN. For multiple alarms in IBN, only the first one is considered, and the rest are ignored.

2.

FN: if there is no alarm in IBN.

3.

FP: if an alarm is outside IN. Each alarm outside of IN is counted as an FP.

In this evaluation scheme, alarms lying outside IBN but inside IN are ignored. This means that we neither adversely penalize these alarms as an FP nor count them as a TP. In event-level evaluation, whether i-LIDS or this scheme, we do not define a true negative (TN). A TN is when a normal (non-intrusion) event is detected as such; in other words, how well we are classifying a normal event as normal. However, this is not the aim of intrusion detection; indeed, it is the opposite. Furthermore, the calculation of TN is ambiguous. We cannot generalize what length of the non-intrusion video should be considered as a TN. For example, a non-intrusion video clip of 5 min cannot be considered as similar to a non-intrusion video clip of 5 days.

II s, but how we deal with scenarios where the intrusion neighborhoods are so close that they intersect one another is another matter. If INs of two or more II intersect one another, then we merge them into a single IN. The new IN consists of n pre frames of the first II and n post frames of the last II , and all of the frames in between are merged as an II . Algorithm 1 summarizes the protocol to evaluate a video at edge-level. Algorithm 1: Edge-Level Evaluation of a PIDS 1 Initialize variables

n

, n pre and n post . 2 Calculate IN for all II s of the video. 3 If two or more INs intersect, merge them into a single expanded IN. 4 Calculate intrusion beginning neighbourhood IBN for each II . 5 Obtain alarms A ( P ) from the PIDS. 6 Calculate TP, FN and FP. 7 Calculate precision, recall and other metrics.

These rules are for individuals, but how we deal with scenarios where the intrusion neighborhoods are so close that they intersect one another is another matter. If INs of two or moreintersect one another, then we merge them into a single IN. The new IN consists offrames of the firstandframes of the last, and all of the frames in between are merged as an. Algorithm 1 summarizes the protocol to evaluate a video at edge-level.

Want more information on perimeter detection systems? Click the link below to contact us.