| Real-Time TV Program Distribution and Storage Server with Keyword Access Capability | ||
|---|---|---|
| Prev | Chapter 2. Related Work | Next |
For better understanding of the modern video streaming technology we must consider targeted applications of the technology as well as emerging protocol standards for this purpose.
In multimedia industry, the term "Video server" usually means the specialized hardware box with huge, fast and reliable storage, fast and flexible outbound network connections and simple programming interface. For example, the MediaHawk [HAWK] server is the medium sized box capable of supporting over 1,000 digital video data streams. This number suggests that the main target audience of this server is the ISPs site serving a small town with several thousand of potential subscribers.
Another example is the more flexible solution from nSTREAMS [NSTR]. They offer the modular system with separate storage and media delivery component modules. Additionally, the central control component module is aimed to perform load balancing and improving the whole system reliability. In the case when fast MPEG decoders at the client side are not available, the digital-to-analog video converted is connected to the regular TV set. Unlike our implementation, this system offers the scheduled playback, which may be useful to provide non-stop video show in public places, like airport wait hall. In conclusion, this solution may be targeted to out-of-city ISP's, where clients are dispersed over a large area and distributed solution with good load balancing is required. HP's video server division which is sold to Pinnacle systems [HP] provides the solution similar in performance and functionality.
Darim Vision's local area network multicasting servers [DARM] is the most closest to our work in nature, but it does not provide any kind of high-level access capabilities that SBSVS supports.
Most of the push video servers on the market support the RTP Payload Format to stream multimedia over the network. This format is a packetization scheme for MPEG1/2 video and audio streams that includes in each packet header sufficient information to make the decision on the playback of this packet. RTP payload format does not specify anything about the network layer implementation, and may be build on the UDP protocol.
The RFC2326 document specifies the video content request and control protocol. This is an application-level protocol to control over the delivery of data with real-time properties. RTSP is used to negotiate the delivery of video. This protocol is technically similar to the HTTP protocol. The reason for that is the possibility to extend web servers, web proxies and gateways to support RTSP as well as HTTP. HTTP's open public key and certificates security schema may be replicated for RTSP as well.
The Internet standard defines two separate protocols for data access control (RTSP) and payload delivery (RTP), suggesting a video content delivery architecture that separates centralized RTSP-enabled control center from the video push server. Our approach is similar: clients should access metadata and initiate the video content delivery ( or "conference" as defined in RFC ) by accessing the centralized server and then receive the video from the appropriate "push server".
At present, we can't find the implementation of this approach on the web: most known broadcasting sites still push low-bandwidth stream directly from their location, instead of spreading push servers across the country. To check for good streaming sites, check Digital Entertainment Network [DEN]. In another example, using the "traceroute" tool to monitor the famous and successful company "broadcast.com" [BCST] we can see that all its sites operate through the same provider's host "sl-audionet-2-0-0.sprintlink.net". This means that it does not try to improve the media delivery quality by setting-up a separate server for the NYC area - it operates only from Texas instead. The typical stream bandwidth for RealAudio player, used by broadcast.com and cnn.com is 80,000 bps while MPEG1 stream delivered in SBSVS is 1.5 Mbps. SBSVS is meant to be used in intranet environment, and therefore can afford to support high-quality media stream playback.
Another one interesting work is Apple's Darwin streaming server, which is the part of Darwin project, open-source and free real-time operating system, specifically designed to give the QuickTime streaming format a huge boost from the server side. [APPL]. This is the Open Source project, which represents the good developer framework to stream QuickTime movies. Darwin supports RTP and RTSP protocols. Apple did not publish any detailed documentation yet, but it seems that this project may be the very good base to extend with functionality implemented in our work.
There is no exact web analog of our approach that uses the closed captions text as the easy and powerful way to improve the accessibility of required content. The transcript of news programs is available for search at the CNN site cnn.com, that helps to locate the movie by keywords. But this system does not use the text as a part of multimedia broadcast. All video broadcasters start to play video from the very beginning, while we offer the choice to play from the fragment selected by the text search. The CNN broadcasts only small programs of few minutes length, where the text and video synchronization is not so important as in SBSVS, where is no limit for the TV program length.
The small company Virage [Virage] offers to Web developers a product which incorporates closed captions and speech recognition as the way to index and search the video. The company's technology can also split the video to separate "shots" to help video engineers to jump from one scene to the next one. They partners with major web search providers and news sites to bring together the video and keyword search navigation. This service will soon appear at ABCNEWS.com, CNEt.com and C-Span.com sites.
The most outstanding work in the video streaming area is the Terabyte Digital Video Library at Carnegie Mellon University [CMU]. This project is the only one we can find that pays a lot of attention not just to simply store and play back video, but to develop a new technology for searching through vast media collections and retrieving the most relevant selections. This research is in the area of artificial intelligence, where the stored video is postprocessed to extract additional information. First, they retrieve as much of text information as possible. The sources of text in video programs are: closed captions, speech recognition and video optical character recognition. Second, the video is processed to retrieve most informative shots to use in indexing.
The paper in IEEE Computer journal [CMU] describes many of innovative heuristics implemented to extract additional information. For example, they observed that text frames on the video are likely to hold on the screen while other parts of the pictures change frequently and probably have the rectangular shape. When text areas are detected the set of related consecutive frames is used to improve the text quality. Then the commercial OCR is used. Another example is the way they partitioned the video segments into shots and selected a representative frame for each shot for video indexing. This
The only drawback of CMU system is the enormous system administration efforts required to maintain this system that also make it very hard to replicate.