This is not a how-to article on setting up a home video surveillance network since even my neighbor, the non-technical guy, installed his own video surveillance system. Most of us have an understanding of how an IP-based video surveillance network works. What we want to cover is why all this phenomenal bandwidth we are creating takes video surveillance to another level and why that may or may not be a good thing — and how to apply this to a large scale city video surveillance network.
A long time ago, video surveillance cameras used terms such as CIF (352×288 pixel resolution) and 4CIF (702×576). Computers used resolutions like VGA (640×480) and SVGA (1024×768). The common denominator in all pf these is the 4×3 screen ratio. Movie makers marched to their own drums with 16:9 ratios until the standard today is the 1080 level (1920×1080).
With the convergence of computers and video, it was obvious that compression methodology needed to be applied because of the limits of CD-ROMs and bandwidth. Various JPEG and MPEG compressions were developed until MPEG-2 became the most universally used compression method for DVD. However, bandwidth and storage limitations, along with increased processor power, drove compression through MPEG-4 (still one of the most popular) and others to the current standard of H.264.
So how does this relate to wireless? In the past and all around the country, cameras over wireless were either a very low resolution (CIF), high-compression (blocky or blurry), or had a low frame-rate (5-10 frames per second or fps). A lot of systems in remote area, e.g. SCADA locations, didn’t even try to move video over the wireless system. Instead they stuck an analog recorder locally with digital output, and then only monitored one or two cameras at a time remotely, regardless of how many were on-site, due to bandwidth limitations. Keeping in mind all the bandwidth we can deliver over wireless systems (see the past few chapters in this series, Tales From The Towers), the question becomes: what can we now do in the real world?
Three years ago, we deployed a video analytic system with 48 cameras across 100 square miles in North Las Vegas and Boulder City, Nevada using a Puretech PureActiv system and SkyPilot 4.9GHz mesh system, so it’s not a new concept. These cameras deliver CIF resolution at about 12fps due to storage limitations. Since multi-megapixel cameras are the next hottest thing and since I’m involved in one of these projects right, I can tell you where we are going next.
Start with the idea that multi-megapixel IP cameras are on the market today are affordable. For example, a 1080i outdoor camera from Axis – the 3334 or 1755 – cost around $1500 or less. There are many other products out there that are even less expensive, but I would test them to make sure they can deliver the frame rates you expect under similar conditions. We found that some of the less expensive cameras could deliver no more than 12fps even though they were rated at 30fps in that particular resolution mode.
The biggest issue is how do you use all that video quality. For live displays, we are probably going to have to limit live viewing to CIF resolutions to get 16 cameras on a single display. With 50 cameras, you might need 5 displays, for reduced size images and one for a full size image might be one way to set it up. This negates having HD video. You can use whatever variation you want from this, even if you want a whole wall of monitors. No matter what you do with a few cameras, there will be point where there are too many screens for anyone to look at simultaneously or the real-time images are too small to have value. In reality, there is no realistic way to cost effectively display and watch 50 high-resolution cameras. So where is the value?
In addition to broadcasting a 1920×1080 video stream or higher, the newer cameras can also capture video at up to 5 Megapixels. That makes for some fairly impressive images and opens up all sorts of possibilities if you can get it back to a central location for processing. That’s where our big wireless pipes start having value. Imagine the camera shooting snapshot every 20 seconds to augment the high-quality video stream for forensic evidence at trial and dumping these images on a central server.
Currently most people utilize this much resolution for forensic use. Usually an accident is going to look the same in HD as well as CIF on video. In fact, the higher frame rate has more value than the resolution. However, the higher image quality might tell us who was driving in the event that becomes an issue; or it might reveal details such as a braking point based on a car nosing down, something that the lower resolution may not. In reality, most of the mega-pixel cameras can deliver both high-frame rates and HD quality.
I’m finishing our first deployment right now where all the cameras are HD quality on the fixed and 4CIF or better on the PTZ cameras (HD PTZ cameras weren’t available from Axis when we started the project). With the cameras set to 1920×1080, 20fps, 30% compression, using H.264, we are seeing about 7-8Mbps.
There are two areas where the higher resolution system has a greater advantage. The first is in the use of forensic evidence at a court trial. If the video shows a person with discernible features, there is a higher chance of prosecution. With CIF cameras, that means either very short ranges or very small viewing areas.
The second and more important use is in the field of video analytics. Video analytics uses a computer to analyze a video stream and look for specific types of activity. It basically turns video surveillance from a forensic device into a pro-active tool. Video analytics have been used in airports and depots to look for loiterers or abandoned luggage. More expensive analytic systems obviously have more features such as license plate recognition and facial recognition. Some video analytic systems can even detect the emotional state of the subject or look for aberrant behavior.
The limitation on analytics has always been resolution, processing power, and algorithms. At lower resolutions, you can’t make out enough details for facial recognition or license plates at any distance. Moreover, higher bandwidth over wireless (remember, this is a wireless series, not a wired series) for transmitting large amounts of data has always been a challenge. At the same time, as the resolution increases, the processing power needs to increase. For example, it take 4 times as much processing power to handle a 4CIF resolution video stream as it does a CIF vide stream. Expand that up to 1080HD resolution and now an older Dual-Xenon server that could handle 8 CIF streams 3 years ago can’t even handle one HD stream.
Fortunately, between Intel and the gaming industry, the answer is right before us. Newer Intel processors using the I7 core have some pretty massive power. Jump into the Xenon version of that processor series and its running 6 cores with 6 virtual cores. Double up the Xenon processor and you have more than sufficient horsepower to do any type high-level video analytics.
Since video analytic processing isn’t any different than game processing in terms of the type of hardware needed, the gaming industry has pretty much given us the answer. High power video cards or GPUs (Graphic Processing Units as they are generally referred to), can be stacked to multiply the processing power. In fact, it’s possible to use 4 GPUs in the same computer that’s capable of cracking weak AES encryption in minutes or hours. Maximum PC built a three-card version of this exact computer. Obviously you want a different hard drive storage combination, but if the software supports the GPUs, here’s the answer.
Improved analytic engines also have the ability to do object recognition. Imagine an Amber Alert that can have every camera in the city scanning for a specific, make, model, and color of a vehicle in real-time in addition to license plates to try to find a child. All of this advanced capability requires three things, lots CIF cameras at very short distances for clarity, fewer cameras with very high-definition, and lots of bandwidth to get this data back to a central location. If it’s wireless, that historically has been even more difficult.
The traffic surveillance system design we used in Sahuarita, Arizona was based on three things:
(2) Capability, currently and in the future
(3) System Expansion
Seven to eight megabits per camera meant that they needed lot of capacity. Originally the design involved 4 access points (APs) with sector antennas covering 360 degrees and up to 400Mbps or more (I told you we would get back the wireless part of the equation eventually). Although the capacity was sufficient when it was originally installed, the RF environment changed while we were finishing the system. I covered interference issues with the local WISP in an earlier article and after my experience with Atlanta, I decided to change this design over as well.
With an equipment change of less than $2000, we expanded the capacity out to 800Mbps and simultaneously reduced noise figures from -75 to -92dB or better. Most lights are now PTP links to either City Hall or between each other. Since the use of highly directional antennas on the main building means my beam patterns are now 6 degrees or less, frequency reuse isn’t an issue. I haven’t used the building as my antenna isolation shield yet but that’s coming next as we add more traffic lights.
Uneven terrain also meant AP hopping wasn’t an option. Since budget was an issue and we already had some of the infrastructure in place already, we stayed with the Ubiquiti equipment. Technically this is now a combination PTP/PTMP design. I didn’t use WDS since I needed security features that won’t work with WDS on the Ubiquiti products. And because the Rockets and Nanostations cost less than $100, the highest cost would be a pole with a Rocket M5 with an MTI dual-polarity 5.8GHz flat panel antenna for about $350. However, as the deployment went in, we made some changes and are now using Powerbridges in place of the Rocket/MTI antenna combinations as they have become available. The result of this design is that every light has an MCS(15) 2×2 MIMO link either directly back to City Hall or in a hop path between lights using the Rockets, Nanostations, and Nanostation Locos. The total cost of all the radio and antenna equipment for 13 traffic lights and 800Mbps of total capacity at City Hall will be less than $10,000 including the 2.4GHz WiFi system that went in simultaneously.
The capability of the system, although it’s still being installed, will provide some excellent prosecutorial evidence when needed. In the case of accidents, the combination of the resolution of the cameras along with the PTZ cameras that are paired with them will allow traffic and public safety the information they need to respond appropriately. If there is a hit-and-run, the fleeing driver will have a more difficult time getting away with high-resolution images of the vehicle and the plate, when available. If the driver leaves the vehicle, the planned video analytic software with virtual tracking with the PTZs are going to keep the driver, in camera view much longer for police to get a better picture for recognition.
Another important feature of these cameras is that they have audio capability. We already apply analytics to gun-shot detection and window breakage applications. Throw in some audio clues for a crash and add in a video analytic rule of two objects trying to occupy the same area at the same time (crash), and false alerts drop.
There is no real growth limit to the system. On the bandwidth side, each traffic light has the capacity to hop several lights, if necessary, or add additional cameras. On the image side, as computer processing power continues to increase, the resolution and bandwidth are already in place to take advantage of it. This means more sophisticated surveillance tools for traffic, law enforcement, and wireless bandwidth for mobile vehicles. Video analytics are the best way to use the increased resolution that an image quality with increased bandwidth capacity can provide.