By Stanislav, a Senior .NET Developer on Murano Software’s team
The latest version of Silverlight delivers hundreds of features, such as a full set of form controls, enhanced data-binding support, printing API, UDP, webcam and microphone support, full trust in out-of-browser, and so on. But it doesn't have an out-of-box solution to stream video or audio captured from a microphone and Web camera via a network. Current API allows you to capture video/audio data from media devices, and it has a way to play back media information using MediaStreamSource and MediaElement control. Also, API has the UdpAnySourceMulticastClient class, which is a client receiver for multicast traffic from any source, also known as Any Source Multicast (ASM) or Internet Standard Multicast (ISM).
Silverlight 4 gets support for microphones and webcams. It allows us to have access to the raw streams. The added support opens a lot of opportunities for new types of applications.
During our work on http://code.msdn.microsoft.com/rca (source code is available under the MS-PL license), we created an application that allows users to communicate over the local network and to establish video/audio conferencing.
The classes that expose this functionality live in the System.Windows.Media namespace.
There are two classes that give us access to audio and video (AudioSink, VideoSink).
To obtain video information from a video input device in Silverlight, you have to derive a custom video sink from VideoSink, which exposes several virtual callbacks:
- OnCaptureStarted
- OnCaptureStopped
- OnFormatChange
- OnSamples
When you derive from VideoSink, you must provide overrides for the callbacks in order to compile.
protected override void OnSamples(long sampleTime, long frameDuration, byte[] sampleData)
{
// Some code here...
}
The idea was to split data related to the particular frame into a set of packets, renumber them, add additional information and send them over the network.
On the client side, the application receives the mentioned packets, orders them, recovers the frame structure and passes frames to MediaElement, using MediaStreamSource.
The sources can be downloaded here. All logic related to the media chat is located in the PSO.Client.UDPMediaChat project (see Figure 1). It contains the following classes:
| Class name |
Description |
| MediaFrame |
Contains all related data to the media sample |
UdpAudioSink,
UdpVideoSink |
These classes are responsible for the capturing raw media data and passing it to the appropriate media channel. |
VideoPacketChannel,
AudioPacketChannel |
These classes are responsible for the preparing data for the transmission. They split media samples into a set of packets and pass them to the transmitter. |
| NetPacketTransmitter |
It contains the logic of the transmission and receives NetPackets over the network. |
NetAudioPacketSerializer,
NetVideoPacketSerializer |
These classes are responsible for the logic of packet serialization and de-serialization. |
NetVideoPacket,
NetAudioPacket |
Stores all necessary media data that is prepared to send over the network. |
| StreamingServer |
Contains logic that allows users to organize raw media data transmission over the network. |
| MediaFrameSource |
Contains media buffering logic. |
| RawMediaStreamSource |
Contains the logic of the prepared media samples for the playback by MediaElement |
Table 1. The description of the most important classes.
Figure 1. Solution explorer
Sequence diagrams show how the application processes media samples before sending them over a network.
We streamed raw data without any compression. Unfortunately, Silverlight doesn't supply codecs yet, but we suppose that it's possible to compress media streams using COM Object Access in Trusted Applications.
Figure 2 shows that we have two media data sources (VideoSink and AudioSink). They pass media data to the corresponding media packet channel, which is responsible for the media frames’ (samples’) processing and further transmission. We want to note that these sinks work separately in different threads. Also, the media channel contains frames splitting logic into a pile of packets. Each packet is sent over the network separately.

Figure 2. Sequence diagram shows simplified algorithm of raw media streaming.
When you use a multicast client, the first thing you have to do is to join the group, using the known multicast IP address (ex: 224.0.0.1 - The All Hosts multicast group that contains all systems on the same network segment) and the port (ex: 9999). The address block 224.0.0.0/24 (224.0.0.0 to 224.0.0.255) is designated for multicasting on the local subnetwork only. When the connection has been made, you can start send and/or receive from the group.
There are several security restrictions on connecting to multicast groups in Silverlight.
The security policy checks included in the Silverlight runtime are designed to prevent networking threats like DoS attacks, DNS rebinding, reverse tunnel attack, etc. Currently, it's not possible to connect to remote ports less then 1024. Before a multicast client is allowed to join a group, the Silverlight runtime implements a protocol check to verify that the client has been granted permission to join the group and receive datagrams.
Sink produces raw media data by the callback:
protected override void OnSamples(long sampleTime, long frameDuration, byte[] sampleData)
{
// Some code here...
}
These data are packed in a media frame and passed to the corresponding media channel. The channel splits the media frame into a set of packets because UdpAnySourceMulticastClient restricts the amount of data that can be sent at once. So the application cuts huge media frames that contain raw media data into small pieces and sends them over the network.

Figure 3. Sequence diagram shows simplified playback algorithm of the raw media stream.
On the client side, UdpAnySourceMulticastClient receives raw bytes. It passes them to NetPacketTransmitter. The transmitter verifies the destination address and “unpacks” the packet.
Then the transmitter notifies the media channels that the packet with media data has been received.
Each channel verifies the ability to process the incoming packet. The logic of the channels could vary drastically, depending on the nature of the received data. For instance, VideoChannel uses a special mechanism for the video frames recovery from a set of packets. When the media channel decides that a subsequent frame is received completely, it passes it to the MediaFrameSource.
This class manages media frame queues for both audio and video frames. Each received frame is added to the particular queue. The frames from the queues are pulled by the MediaStreamSource.
If the queue is empty, the thread that serves MediaStreamSource is suspended until the queue receives at least one frame.
Figure 5. Audio chat window
A user can perform a call to other user by clicking on the phone icon at the left-side of the main window. The other participant has to confirm the call. If the call is confirmed, the special media control will be displayed (see Figure 5). By default PSO establishes an audio conference. If you want to perform a video call you have to press the film icon. In this case your application instance will start a video transmission and the other participant will be able to see you. The video chat window is displayed on figure 6.
Figure 6. Video chat window
We successfully realized a proof of concept that shows the possibility of video conference software development on the Silverlight 4 technology. Also, we built a universal extensible framework that allows users to transmit data in the local network.