Internet of Video Things (IoVT): Next generation of video surveillance systems

Over the last two decades, the development of the IoT has witnessed its immense potential in which physical environments can be equipped for seamless cyber-physical interaction with smart objects integrated with new information and communication technologies. However, nowadays, things are no longer limited to personal items such as smartphones, tablets, and smartwatches. Instead, it has come to include large-scale smart objects or embedded sensors in our environment connected by a gateway device, which transmits data to the processing center. Among these things, we find visual sensors or cameras. These sensors are widely deployed, and their generated data represents over 75% of IoT traffic, particularly in video surveillance systems. Due to their volumetric generation of visual data content, video surveillance systems require significant storage resources, transmission bandwidth, and power consumption.

To cover these requirements, increase the flexibility, and reduce the cost of deploying video surveillance systems, smart cameras, combined with the Cloud, Fog, Mist, and Edge computing paradigm and all IoT, forming the next generation of video surveillance systems that called it the “Internet of Video Thing or simply IoVT”. IoVT is a part of the IoT capable of efficiently processing large volumes of data, such as images and videos. Compared to conventional systems, VSS in an IoVT framework provides multiple layers based on new computing paradigms such as Edge, Mist, Fog, and Cloud computing as an infrastructure of communication and decision making by capturing and analyzing rich contextual and behavioral information. IoVT aims to develop a more efficient, flexible, and cost-effective video surveillance system adapted to smart cities’ new requirements regarding the safety and security of citizens in public and private places.

Figure 01: IoVT architecture (Extension of the IoT architecture from [1])

From the discussion above, we can define the IoVT as:

IoVT objects are different from IoT objects. They require more storage space because of the massive amount of data they generate, more computing power to process their complex data, and more energy with higher bandwidth for supporting data traffic. Real-time deployment scenarios of IoVT range from smart surveillance, smart cities, smart transportation, smart agriculture, and smart homes. The following table summarizes the main characteristics of the IoT and IoVT.

Table 01: IoT vs. IoVT

Two concepts have appeared in recent years with “IoVT”: Internet of Multimedia Things (IoMT) and Internet of Surveillance Things (IoST).

Concerning the “IoMT,” Sheeraz et al. in [2] define it as :

While “IoST” appears in a single article [3], but without defining any information on this concept.

From the definition above, we can say that the IoMT covers the problems of all the multimedia sensors and the massive data generated by them, and IoVT focuses only on visual sensors that represent both cameras and their types and video or image data. In contrast, IoST focuses on the application of video surveillance in the IoT. Finally, we can propose the hierarchy of these concepts as represented in the following diagrams:

Figure 02: The proposed hierarchy of IoT, IoMT, IoVT, and IoST


[1] Benrazek, Ala‐Eddine, et al. “An efficient indexing for the Internet of Things massive data based on cloud‐fog computing.” Transactions on Emerging Telecommunications Technologies 31.3 (2020): e3868.

[2] Alvi, Sheeraz A., et al. “Internet of multimedia things: Vision and challenges.” Ad Hoc Networks 33 (2015): 87–111.

[3] Gao, Zhifan, et al. “Trustful Internet of Surveillance Things Based on Deeply Represented Visual Co-Saliency Detection.” IEEE Internet of Things Journal 7.5 (2020): 4092–4100.

Ph.D. Student in Computer Science