none, this strategy replaces a missing value by calculating the majority value recorded for the item’s k nearest neighbors.
The number of neighbors to consider is k. The system selects k previous and next nearest neighbors to calculate the missing value. If a tie occurs between two values, the algorithm selects
the lowest timestamp value.
For example, the system monitors if meter 1 is on or off reporting a value of true (on) or false (off) every 15 minutes (the
interval). If a value for 2:15 is missing between time stamps 2 pm and 3 pm, and k = 3, the system finds the three nearest timestamps to 2:15, which are 1:45, 2:00, and 2:30 and takes the majority of these three
values. The table records these values:
| Timestamp | Meter 1 on state |
|---|---|
| 1:30 | true |
| 1:45 | false |
| 2:00 | true |
| 2:15 | false (interpolated value) |
| 2:30 | false |
| 2:45 | false |
In the example, the majority value, considering the three neighbors, is “false.” The system assigns this value to 2:15.
A tie can occur with numeric, Boolean and enum data. The system handles a tie in a particular way:
k) of nearest neighbors exists, the system gives preference to the value from the record with the timestamp nearest to the
missing record.In Table 1, k=4, its preceding nearest neighbors’ majority value is zero. Its succeeding nearest neighbors’ majority value is 1. Using rule
2 above, the system breaks the tie by assigning the missing value to the same value as the preceding timestamp (1:45).
| Timestamp | Enum data |
|---|---|
| 1:30 | 0 |
| 1:45 | 0 |
| 2:00 | 0 (interpolated value) |
| 2:15 | 1 |
| 2:30 | 1 |
In Table 2, k=1, its preceding nearest neighbor’s value is 1. Its succeeding nearest neighbor’s value is 3. The system recorded both values
at the same interval before and after the missing value, so it gives preference to the preceding timestamp, and sets the missing
data value to: 1.
| Timestamp | Enum data |
|---|---|
| 1:30 | 0 |
| 1:45 | 1 |
| 2:00 | 1 (interpolated value) |
| 2:15 | 3 |