Suitability of Non-Contact Infrared Thermometers (NCITs) for COVID19 Screening

Sarthak Kumar, Maximilian Wagner, William Pozehl, James P. Bagian, M.D.

Introduction

With the onset of COVID-19 pandemic in 2020 and its progression in United States the operation of many institutions and organizations was drastically curtailed. One of the first steps implemented to avoid the spread was interim lockdowns to limit in-person operations. In an attempt to reduce the risk of in-person operations, where remote work was not practical or possible, screening of individuals immediately prior to admission for in-person activities to detect COVID-19 infected individuals was instituted on a widespread basis.  One screening tool suggested by the CDC, OSHA and other organizational entities was the use of Non-Contact Infrared Thermometers for screening of potentially infected individuals to detect individuals with elevated body temperatures that might indicate active infection or disease.

We were concerned because we were aware of the limitations of NCIT use for screening purposes due to both technical limitations of the devices themselves as well as the importance of user technique in securing the best results from the device. When used for screening purposes the NCITs and the manner in which it is operated must be viewed as a system and this system has multiple vulnerabilities that may limit its use as a screening tool.

 

NCIT Temp Scan
Source: State of Michigan Department of Health and Human Services Public Service, April 2nd, 2021

Problem statement

“Are Non-Contact Infrared Thermometer suitable for COVID-19 screening?”

Approach

To address the question of suitability of NCIT use for screening we explored the following questions:

How do NCITs perform in actual real-world use?

In order to check the real-world use of these devices and test their validity in temperature measurement, we collected data on 2,370 individuals from two screening stations on the University of Michigan, North Campus over the course of 8 consecutive days in August 2020. The process of screening at these stations was not interfered with and the screeners at these screening checkpoints continued to take NCIT measurements in the same manner as they had been doing for over two months. We deliberately did not interfere with procedures or operation of the checkpoint because we wanted to capture the NCIT readings taken at the checkpoints that were representative of the screening system in its entirety from a systems perspective.

Because of our decision to not perturb the existing screening system operations which could introduce observer-driven artifact we needed a reference against which to compare the NCIT readings to allow assessment of in-use performance and suitability. We chose to use literature-based temperatures of normal individuals. The traditional ‘normal’ temperature as per the 150-year-old Wunderlich’s criteria is 98.6 F. However, our review of the literature (Geneva et al., 2019) revealed that the mean body temperature in a healthy population is 97.7 F which is substantially lower than the traditionally considered 98.6 F temperature mark.

We simulated a distribution of 5,257 data points based on the mean and standard deviation we derived from the literature to generate the population temperature distribution described in the chart above by the blue line. The orange line describes the NCIT data collected from screening checkpoints at University of Michigan North Campus. The mean NCIT temperatures recorded at screening stations on North Campus was 96.9F which is 0.91F lower than the mean literature-based temperature of 91.7F for a healthy body temperature.

The Chi-square goodness of fit test confirmed that the observed NCIT screening results (orange line) were different from the literature-based temperatures (blue line) at a significant level (alpha value=0.05).

This demonstrated that the NCITs as used at the screening checkpoints provided suspiciously low temperatures that were likely inaccurate and could result in many false negative screening results and fail to identify individuals who might be infected.

The question is why?

We found out that the temperature measurement by NCITs is significantly altered by factors related to the operation of the device such as pre-measurement quiescent period of subject being screened, the distance of the device from the subject’s forehead, the ambient temperature, and the ambient lighting condition. Additionally, the technical capability of the device itself should also be considered when taking measurements with these devices. Of these potentially confounding influences, the distance from NCIT to the subject’s forehead is particularly sensitive to the device user’s technique. We found out that changes in the distance from forehead as recommended by the manufacturer can substantially affect the temperature measurements displayed by these devices.

The perturbation of NCIT readings to the distance at which they were used caused us to test the devices that were used on North Campus side by side within and outside of the manufacturer’s guidelines. We tested these devices on subjects adhering to the manufacturers’ recommendation that included keeping the subject quiescent for 30 minutes at a controlled ambient temperature and only varying the distance of use to distances both within and beyond the manufacturer’s recommended distance from the forehead. We can clearly see the difference in the performance of the devices with the help of the violin plots in the figure above where one device shows extreme variation when operated outside of the manufacturer recommended distance.

This brings us to our next question:

Are NCITs all the same?

To answer this question, we collected 5 different devices that were being used at some of the largest healthcare systems in the United States including Michigan Medicine.

We operated these devices both within and outside of the manufacturer recommended distance following the quiescent period rule.

A thing to note from both the violin plots is that 3 of the 5 devices used for evaluation were 510K cleared by the FDA.

It can be seen that even though one of the devices (FDA 3 represented by green violin plot) was FDA 510k cleared its performance was extremely sensitive to the distance variation and yielded more inaccurate readings than the non-FDA cleared devices when used outside the manufacturer’s recommended distance for use. Observations of NCITs in actual use at both the University of Michigan screening checkpoints as well as more broadly, even including government websites, show NCITs being routinely used at distances beyond those recommended by the manufacturers.

This shows that all the NCITs are not alike in their performance, especially when used outside the recommended conditions as specified by the manufacturer. It also shows that this difference in performance is true even within the FDA 510k cleared devices we tested when comparing their application in ‘as-used’ circumstances outside recommended conditions.

Conclusion

We found that the NCITs, especially as used in actual operation, have a general tendency to underestimate the physiological body temperature.

Furthermore, the pre- measurement quiescent period is a significant factor in determining the performance of the device. This can be noticed from the data taken from the University of Michigan North campus. All the measurements were taken in the month of August where the subjects entered the building without complying with any pre-measurement quiescent period. This means that they likely had elevated body temperature from both their activity level and the hot environment outside ambient environment and despite this the NCIT temperatures of screened subjects read as low as 93.4 F.

Had they complied with the pre-measurement quiescent period guideline as recommended by the manufacturer for the optimum performance of the devices, they might have yielded even lower NCIT displayed temperatures further exacerbating the tendency to yield false negative screening results. By its very nature, a screening test is intended to tend towards yielding false positives rather than false negatives. In this way a diseased individual is less likely to slip past which is the reason that screening is done to begin with. In the case of NCITs as commonly used our results show that they have a far greater propensity to yield false negatives than false positives which is the exact opposite of what is expected of a screening tool. Therefore, based on our work we conclude that NCITs as commonly used are poorly suited as COVID-19 screening tools.


References: