Streaming media is flocking the IP network with video, audio, game, live TV, and educational streaming services. With the present pandemic brought about by coronavirus disease 2019 (COVID-19), social distancing could be a norm, and streaming media consumption could further increase. An easy subscription-based model and high-access network bandwidth combined with cloud and CDN technologies have made these services extremely successful. Apart from these, the adaptive bit rate (ABR), supporting MPEG-DASH (Dynamic Adaptive Streaming over HTTP) and advancing encoding technologies such as MPEG-4, has contributed to providing streaming services at reasonable costs. The streaming industry is highly competitive, a quality-of-service (QoS) measurement and benchmarking from a client’s point of view can give an edge to the service provider. A model to compare the QoS of streaming services is deduced in this study. The model is further used to compare the most prominent streaming service providers. Their infrastructure and delivery model are also surveyed extensively in this research.
This paper does a comparative investigation of QoS for existing streaming services, including Netflix, Amazon Prime, and YouTube. To deliver a QoS measurement from the client machine, the system requires an accurate traffic classification and identification of the stream they are interested in, and then a calculation of QoS based on the different parameters, including throughput, buffering, startup delay, and TCP errors. For accurate classification of the traffic, we require to study the infrastructure and origins of the stream first. Hence, the first part of the paper surveys the general infrastructures, protocols and static page’s features of these services. Several standards and protocols involved in end-to-end stream delivery were first analyzed, then the cloud and CDN infrastructure of these three services were compared to ascertain their core and edge caching mechanisms. An infrastructure study helps one identify the correct stream and map the IP addresses of these service providers. We briefly explain the Open Connect, Cloud-Front, and Google Peering infrastructures behind these services, apart from protocols and service models.
The second part of the paper focuses on overall streaming behavior and QoS. There are two different approaches to quality assessments. First is a subjective approach also called quality-of-experience (QoE). QoE is measured using mean opinion score (MOS)  in the range of 1 to 5. This is a subjective measure and requires user perception-related matrix. The second approach is the objective approach also called quality-of-service, QoS [2,3], measured using the application, network, and transport parametric. Several standards have been developed to translate a QoS model to MOS-based QoE models using mathematical equations [4,5]. Even if the standards are developed and implemented by several providers, it is still a challenging task to deduce the QoS of service and requires a lot of measurement data.
Measuring QoS depending totally on application layer metrics related to video players is impractical, as it will require different techniques for different service providers. Application layer metrics could be startup delay, rebuffering events, playback buffer, etc. Media presentation and description (MDP) files  or player API such as YouTube iframe API  can be used to collect these data, however, MDP files are mostly encrypted and most of service providers do not provide player APIs. Hence designing a generic QoS model based on application layer metrics becomes a challenging task.
TCP throughput, video packet delay, jitter, etc., can be used for network parametric models, however, in ABR technology these parameters are ineffective as the video encoded at different resolutions and bit rates are stored in the server and the bitrate changes with the network condition resulting in almost no jitter and packet loss. Also, the playout buffer at the application layer manages the delay to a significant level. As the streaming technology changes and adopts the ABR and rate-controlled streaming, our parametric models need to evolve. Also, the measurement technique to collect QoS metrics should be generic to all the applications. Hence in this paper, we depend totally on streaming patterns to extract QoS metrics. The goal of this paper is to provide feature extraction techniques based on patterns and characteristics of streaming. We first compare the QoS for static pages followed by the QoS of streaming at the granular level.
This paper’s major QoS matrices are rebuffering time, average media bit rate, progressive download ratios, and standard deviation of the On-Off cycle. These streaming services usually download the content at extremely high speed initially, known as the “buffering state,” which was slowly known later as the “steady state” . The progressive download ratio is the ratio of these two phases. In the steady state phase, the videos are downloaded in the On-Off cycle. On cycle represents the chunk size also known as the block size. Off-cycle represents the waiting time, during which data is not downloaded keeping the bandwidth free. The standard deviation of the On-Off cycle shows if the download bit rate during the session is consistent or jittery. The QoS in the bandwidth range of 75 kbps to 30 Mbps for 50 different videos of 180-second duration were compared. These tests contributed to the thorough analysis of the HTTP-DASH streaming and ABR behavior of these services. Such detailed comparative studies help a new service provider in requirement analysis. This study clarifies the infrastructure requirements of the streaming services and streaming strategies of different streaming service providers. The QoS measurement model adopted by this paper is carefully chosen from ITU-T standard [1,4] and Streaming Media Alliance . Our work can act as a base case for classifying and labeling streaming test cases for developing a QoS model based on supervised learning in the future. The proposed approach can help service provider compare their services with their competitors without application-level details and player APIs. Service providers can also do feature analysis and fine-tune these features for better ABR strategy and flow control mechanism.
Rest of the paper is organized in the following manner. Section 2 elaborates on some of the related work and their shortcomings. Section 3 explains the CDN and cloud infrastructure of these services. Section 4 elaborates on the method used to measure the QoS. Section 5 discusses the results of our measurement studies and Section 6 concludes our work.
2. Related Work
QoE focuses on the entire service experience; however, QoS is a network performance measure that is closely related to network deployments, such as caching, traffic engineering, and resource reservation. QoE and QoS are closely related to each other and there is a great number of standards from the Internet Engineering Task Force (IETF) that define them . This paper’s measurement studies are limited to QoS calculations. A great deal of QoE models have been already surveyed in past . QoE modeling is categorized into four types, including signal-based model, parametric model, bitstream model, and hybrid model. This study’s QoE model is a parametric type. A subset of QoE features forming the parametric model was also recommended by the Streaming Media Alliance , whose QoE parameter was highly simplistic and was utilized in this paper. Gathering QoE parameters for parametric model is another challenge. Several researchers depend on MPD file to gather these matrixes as per the previous survey . When a first request for the video file is made by the client, the server sends the corresponding manifest file, also called MPD which consists of the details about the video file, such as the video duration, segment size, available representation levels, and codec. The client’s rate adaptation logic depends on the QoE calculation from these MPD files. The rate of adoption broadly depends on available throughput or buffer size. However, these MPD files are encrypted, and a third party cannot manipulate them. There is a need for a model which could compare different streaming services from the premises without needing any specific information from the service providers. In this research, we solely depend on network and transport layer information gathered from packet streams to collect the QoS parameters. Previous researchers used similar approaches comparing Google Cloud CDN, Azure CDN, and Amazon CloudFront using PlanetLab at different locations, concluding that Amazon CloudFront had a slightly better QoE than Google Cloud and Microsoft Azure . The drawback with their work is that they deployed their own apache servers in the computing instances in various clouds to host their own website for dash. The videos are generated with a fixed bit-rate and their Python-based client collects data from different PlantLab locations. Our paper’s measurement studies are not dependent on any such deployments. It rather compares these three services based solely on network and transport layer information gathered from packet streams directly served from their servers. There is a list of researches following a similar approach as ours [13,14]; however, these studies are limited to YouTube traffic in particular. None of these researches have compared Amazon Prime with Netflix and YouTube. Furthermore, their test beds have not covered an Asian region, such as South Korea.
3. Comparing Cloud and CDN Infrastructure
Before measuring the QoS we analyzed the cloud and CDN infrastructure of these services. This study helps us identify the list of IP addresses and domains associated with each service. It also helps us in classifying actual content streams from third-party services such as advertising, analytics, social networking plugins, or AI stream from the same service provider.
Some service providers have their own CDN and cloud infrastructures; however, others depend totally on third party CDNs. For example, YouTube depends on Google-peering CDN and Google Cloud services. Netflix uses Amazon Web Services (AWS) for compute services and OpenConconect CDN for content cache. Amazon prime uses AWS for compute services and CloudFront and Akamai for content cache.
3.1 Open Connect, CloudFront and Google Peering
CloudFront is Amazon’s Content Distribution Network (CDN). Amazon S3 bucket and EC2 instances are configured as the origin server for static and dynamic contents, while computing contents and CloudFront are configured as a CDN for edge locations . First requests for the content are served by S3 services and then the requests are cached at CloudFront edge. It takes lesser hops to reach the edge location using the CloudFront edge network for further requests. For Amazon Prime video HTTP request- response, videos are noticeably served chiefly by the CloudFront; however, in the case of CloudFront servers’ unavailability, they are sometimes served from Akamai CDN, llnw, level 3 communication CDNs in South Korean premises. Akamai edge network is stronger than CloudFront and it is a fallback strategy for Amazon. Amazon CloudFront currently has edge location with 188 Points of Presence (PoP) edge locations in 70 cities across 31 countries as per July 2017. However, Akamai has 1,700 edge locations across 130 countries.
Similar to Netflix, Google also deploys its caching infrastructure through peering technologies. If an ISP expresses an interest in hosting Google’s edge cache, then Google usually ships its servers to be attached in the ISP’s facilities. After the deployment of these servers, they become part of the Google Edge Node (GGC) infrastructure and part of the AS151169. Apart from GGC, Google has a core data center at the core of the network and an edge PoP network as a middle layer. Google’s edge software-defined networking (SDN) architecture is called “Expresso,” and the core SDN architecture is called “Jupiter.” Usually, Google does not allow peering with its core network infrastructure. GGC infrastructure serves 60% to 80% of the total traffic, saving the backbone and transit bandwidth. Google also has the Google Cloud Interconnect (GCI) service, which is basically a Google Cloud platform. In South Korean premises, Google rents a cache in KT network and also uses Google Cache in Japan as a backup.
3.2 Comparative Analysis of Services
Most streaming service providers use third-party services for advertising, analytics, social networking plugins, or AI on their web pages. All these requests are served from different domains and sometimes different clouds and CDNs to provide a satisfactory overall experience.
One can also see that all these three services use HTTP DASH in the application and TCP for the transport layer, except for YouTube which uses Google QUIC on top of UDP on Chrome and Firefox. Netflix usually binds the users to preferred CDNs. However, Amazon and YouTube bind users with multiple CDNs. If the main Amazon Prime domain does not respond, then the requests are served by Akamai and llnw and then level 3 communication domains. Similarly, for YouTube, if the requests cannot be served from Google Cache at KT, then they are served by Google US domains. Chrome DevOps network blocking features were utilized as they can block a particular domain, server, or list of servers. This feature is useful for measuring if the users are attached to a dedicated server or CDN, or in the case of failure if they can be switched to other servers and CDNs. For logging in pages, Amazon connects to fewer domains, and hence its DNS lookup and SSL.
4. Experimental Results
After analyzing the CDN and cloud infrastructure of these services we collected 50 video streams each 180 seconds on bandwidth range of 75 kbps to 30 Mbps for all these three services using the Chrome browser.
Fig. 1 provides an overview of this study’s QoS inference approach.
QoE measurement and calculation.
(1) Chrome browser DevOps tools have bandwidth configuration options, which were used to configure client bandwidth and then start capturing video stream using Tshark tool for a total of 180 seconds.
(2) While streaming, there are several types of HTTP response sent from different non-origin servers to the browser. These requests include audio, video, advertisement, streaming logic, and other types serving other AI requests. All these responses give a holistic video watching experience. The first step is to filter the video stream among several other streams from various non-origin servers. It was identified by capturing the successive IP packets having 5-tuple flow information, including protocol (TCP/QUIC), source address (streaming server IP address), source port (433), destination address (host address), and destination port (port on which the browser is running; for chrome browser, mostly there are multiple ports for streaming) with the highest amount of data. To classify the right stream, IP address was matched with the list of domains of the service provider, and a source port number to 443. Automatic “HTTP stream classification” can be another topic of research, but currently that is not this paper’s focus.
(3) After the stream is identified, only three features including time (ti), segments (si), and segment length (sl) were extracted from the packet trace.
(4) These three features are used for the calculations of this paper’s main QoS matrices, which are rebuffering time, bitrate, progressive download ratio and standard deviation of the On-Off cycle. Here, the parameters are elaborated in detail.
Playtime (t): This is the actual length of the video. This duration does not include rebuffering time. All the test cases have a total playtime of 180 seconds.
Actual play time (tn): This is the duration of a video that is being played with jitter and rebuffering. For a 180-second video, an application can take more than 180 seconds to play completely, especially if network conditions are not good.
Rebuffering time: This is the time in which a viewer experiences rebuffering issues (i.e., when a video stop playing because of buffer underflow and not due to user interventions such as scrubbing or pausing).
Rebuffering ratio: This is a calculation of total rebuffering time divided by the sum of total playtime.
Bitrate: This metric shows the average bit rate at which content was received. It is calculated as the number of bits received and decoded during a play, divided by the total playing time.
Buffering phase: Buffering phase is when the data transfer rate is limited by the end-to-end available bandwidth. In Fig. 2(a), one can see that Netflix downloads more than 7,000 segments (sb) of size 1,541 bytes (sl) each in just the first 5 (tb) seconds. This 5 second is a buffering phase.
Steady state phase: In the steady state phase, the average download rate is slightly larger than the video encoding rate. In Fig. 2(a), one can see that Netflix downloads 6,061 (ss) segments of size 1,541 (sl) bytes in 166 seconds (ts).
Progressive download ratio: It is the ratio of the average bitrate and the bitrate at a steady state. If enough segments have been buffered then, this ratio will result in lower values. A lower progressive download ratio means that the bit rate at buffering state is higher, resulting in enough downloaded segments for playback. A progressive download ratio of 1 indicates there is no buffering done at the buffering state.
SD of ON cycle: It is a standard deviation of the On-Off cycle of a stream in its steady state phase. During the On cycle, a chunk of data is downloaded. It’s also called a block size in streaming. Low SD means that the stream is downloaded with a constant bit rate. If the SD is high the bit rate has been varying during the download, and it represents the jittery stream. Higher SD of the On-Off cycle represents a very jittery stream.
5. Measurement Results and Discussion
The streaming behaviors of 50 different videos on all three service providers were compared. We carefully choose some common videos for the tests across these platforms. The basic ABR characteristics of all these services are the same. This means, there are two phases of the video download. The video stream begins with the buffering phase followed by a steady state phase. There is a cycle of the On-Off period in the steady state phase, which is used to limit the download rate. Here we will compare these three services to the four main QoS features of our model.
Number of segments downloaded per time stamp for Peppa Pig, Episode 1: (a) download speed of 37 Mbps and (b) download speed of 250 kbps.
Bitrate: The maximum segments are downloaded by Amazon (16,000), followed by Netflix (14,000) and YouTube (6,000), for the same content. The best bit rate result is shown by Amazon and then Netflix followed by YouTube (Fig. 2(a)). For lower bandwidth, the bitrates were adjusted well for Netflix and Amazon based on the ABR protocol HTTP-DASH (Fig. 2(b)). For this particular video, YouTube does not have the video encoded at lower bit rates, hence one can see that bitrate is still high for YouTube (Fig. 2(b)). Fig. 3 shows that bitrate is directly proportional to the available bandwidth. Even if the available bandwidth is high, some videos are not encoded with high resolutions and hence they still acquire a less amount of bandwidth and eventually a lower bitrate. One can see this behavior in Fig. 4, which shows the results of different videos of different qualities and resolutions for each service provider. Although the available bandwidth is high (37 Mbps), some videos show a lower bitrate. This is because those videos are encoded at low bit rates and are not available in high resolutions. Hence merely a bit rate cannot quantify the quality of the stream. We further compare the “rebuffering time” and “progressive download ratio” with the SD of the On-Off cycle to obtain the quality of the video.
Rebuffering time: We tested 50 different videos with different bandwidth configurations at the client on all these platforms to calculate rebuffering time. The rebuffering time of one of the videos on different bandwidths is shown in Fig. 3. Rebuffering time is represented in the graph with the blue line. The time beyond the blue line represents the rebuffering time. For higher bandwidth, rebuffering time for all these services is null, showcasing a good QoS; however, for lower bandwidth, the rebuffering time for YouTube video is higher compared to Netflix and Amazon as shown in Fig. 3. For Netflix, one could not measure the bandwidth below 250 kbps, as Netflix does not provide services for such low bandwidth. For Amazon and YouTube, one could, however, test bandwidth as low as 75 kbps. One can see that the rebuffering of Netflix is the lowest among these service providers. In some incidents, they even download their video before time, even for lower bandwidth. This showcases the good overall quality of Netflix streams.
Progressive download ratio: The ratio of steady-state bit rate and overall bit rate gives the progressive download ratio. The lesser the progressive download ratio is, the better the QoE of the video will be. We tested 50 different videos with the highest bandwidth capacity of 30 Mbps to test the progressive download ratio, as shown in Fig. 4. For clarity figure only shows the results of 20 videos. The slope of the buffering phase as well as the steady-state depends on the available bandwidth. For lower bandwidth, all these services do not have buffering phase hence progressive download ratio cannot be calculated. One can observe that the Netflix buffering phase downloads a higher number of segments in less time compared to Amazon and YouTube. Netflix has the lowest average progressive download ratio of 0.49. This shows that their replay buffer accumulates enough data for playing resulting in the best QoE.
SD of On-Off cycle: The other interesting characteristic which separates these three services is their On-Off cycle. We observe that Netflix and YouTube have a bigger On cycle, which means a bigger chunk of data is downloaded at a time, as compared to Amazon. Netflix and YouTube also have a lesser number of blocks of the On-Off cycle as compared to Amazon. A bigger On cycle will be helpful for filling the replay buffer but require higher bandwidth. Amazon compensates for this by having a greater number of On-Off cycles as compared to Netflix and YouTube. So even though their cycle size is small they download an equal amount of data for the same time period by having a greater number of cycles. We also observe that in general if the standard deviation of the On-Off cycle is high the quality of the video is bad. In our overall experience, the SD of the On-Off cycle of Netflix and Amazon is much lower than YouTube. Hence Netflix and Amazon show consistency in bit rate as compared to YouTube. This results in lesser jitter and better quality of the video.
Netflix performs the best among these three service providers by displaying a high bitrate for lower bandwidths, less rebuffering (Fig. 3), better progressive download ratio (Fig. 4), and lower SD of the On-Off cycle. Even though one carefully chooses the common contents available from all these service providers, the encoding rates of the original contents could be different, and one cannot know the exact encoding-related matrix from these service providers; hence, one just relies on the bitrate received at the end user client. Also, the results could vary for different geographical locations, depending on the edge infrastructure. This study is a demonstration of how one can compare different streaming services from a client’s machine and choose the services best suitable to him/her.
Rebuffering: (a) Netflix, (b) Amazon Prime, and (c) YouTube.
Progressive download ratio: (a) Netflix, (b) Amazon Prime, and (c) YouTube.
6. Conclusion and Future Work
In this paper, the most popular global streaming service providers, including Netflix, Amazon Prime, and YouTube were surveyed. Their infrastructure and protocols were studied, their services were compared, and methodologies for doing QoS analysis were derived. QoS analysis was conducted for these popular service providers by means of calculating the bitrates, rebuffering time, progressive download ratio, and SD of the On-Off cycle on different bandwidths for 50 videos each. Based on these calculations, it was concluded that Netflix had the best performance; however, the QoS of Amazon Prime was also reasonably good.
This paper demonstrates an approach to do a quality analysis based on the flow of information. The approach can also be used to compare and contrast the importance of different features of these services. Our analysis can be used by the service providers to analyze their competitors and fine-tune their streaming strategies. In the future, a machine learning approach to identify good and bad quality video services should be devised.