Leonardo Chiariglione | Blog

Quality, more quality and more more quality

Quality measurement is an essential ingredient of the MPEG business model that targets the development of the best performing standards that satisfy given requirements.

MPEG was not certainly the first to discover the importance of media quality assessment. Decades ago, when still called Comité Consultatif International des Radiocommunications (CCIR), ITU-R developed Recommendation 500 – “Methodologies for the subjective assessment of the quality of television images”. This recommendation guided the work of television labs for decades. It was not possible, however, to satisfy all MPEG needs with BT.500, the modern name of CCIR Recommendation 500, for three main reasons: MPEG needed methods to assess the impact of coding on video quality, MPEG dealt with a much wider range of moving pictures than television and MPEG ended up dealing with more than just 2D rectangular moving pictures.

Video quality assessment in MPEG began in November 1989 at the research laboratories of JVC in Kuriyama when all aspects of the responses to the MPEG-1 Call for Proposals (CfP), including quality, were considered. Two years later MPEG met again in Kurihama to consider the responses to the MPEG-2 CfP. At that time the assessment of video quality was done using the so-called Double-stimulus impairment scale (DSIS) using a 5-grade impairent scale. In both tests massive use of digital D1 tapes was made to deliver undistorted digital video to the test facility. The Test subgroup led by the chair Tsuneyoshi Hidaka managed all the logistics of D1 tapes coming from the 4 corners of the worls.

The MPEG Test chair could convince the JVC management to offer free use of the testing facilities for MPEG-1. However, he could not achieve the same for MPEG-2. Therefore MPEG-2 respondents were asked to pay for the tests. Since then participation in most if not all subjective tests campaigns has been subject to the payment of a fee to cover the use of facilities and/or the human subjects who were requested to view the video sequences under test. The MPEG-1 and MPEG-2 tests were carried out in the wake of Recommendation BT.500.

The MPEG-4 tests, carried out in 1995, fundamentally changed the scope because the CfP addressed Multimedia contents, i.e. progressively scanned moving images typically at lower resolution than TV which was supposed to be transmitted over noisy channels (videophone over fixed subscriber line or the nascent mobile networks). The statistical processing of subjective data applied to the MPEG-4 CfP was innovated by the use of ANOVA (analysis of variance), because until then tests only used simple mean value and Grand Mean, i.e. the mean value computed considering the scores assigned to several video sequences.

The use of Statistically Significant Difference (SSD) allowed a precise ranking of the technologies under test. Traditional test methods (DSIS and SS) were used together with the new Single Stimulus Continuous Quality Evaluation (SSCQE) test method to evaluate “long” video sequences of 3 minutes measure how well a video compression technology could recover from transmission errors. The tests were carried out using the D1 digital professional video recorder and Professional Studio Quality “grade 1” CRT displays.

The Digital Cinema test, carried out in 2001 at the Entertainment Technology Centre (ETC) of the University of Southern California, was designed to evaluate cinematic content in a real theatrical environment, i.e. on a 20 m base perforated screen, projected by a cinema projector fed with digital content. The subjective evaluations were done with three new test methods: The Expert Viewing Test (EVT), a two steps procedure, where the results of a DSIS test were refined by means of careful observation by a selected number of “golden eye” observations, the Double Stimulus Perceived Difference Scale (DSPDS), a double stimulus impairment detection test method using a 5 grades impairment scale and the Double Stimulus Split-Screen Perceived Difference Scale (S3PDS), a test method based on a split screen approach where both halves of the screen were observed in sequence.

The test for the Call for New Tools to Further Improve Coding Efficiency were done using traditional test methods and the same methodology and devices of the MPEG 4 Call for Proposal. The test demonstrated the existence of a new technology in video compression and allowed the collaboration between ISO and ITU-T in the area of digital video coding to resume. This was the first test to use the 11-grade impairment scale, that became a reference for DSIS and the SS test experiments, and provided a major improvement in result accuracy.

A new test method – the VSMV-M Procedure – was designed in 2004 to assess the submission received for the Core Experiment for the Scalable Video Coding. The Procedure was made of two phases: a “controlled assessment” phase and a “deep analysis” phase. The first phase was made according to the DSIS and SS test methods and a second phase, designed by MPEG, where a panel of experts confirmed the ranking obtained running the evaluation done with formal subjective assessment. These test were the first to be entirely based on digital video servers and DLP projector. Therefore, 15 years after they were first used in the MPEG-1 tests, D1 tapes were finally put to rest.

The SVC Verification Tests carried out in 2007, represented another important step in the evolution of the MPEG testing methodology. Two new test methods were designed: the Single Stimulus Multi-Media (SSMM) and the Double Stimulus Unknown Reference (DSUR). The SSMM method minimised the contextual effect typical of the Single Stimulus (SS) and the DSUR was derived from the Double Stimulus Impairment Scale (DSIS) Variant II introduced some of the advantages of the Double Stimulus Continuous Quality Scale (DSCQS) method in the DSIS method avoiding the tricky and difficult data processing of DSCQS.

The Joint Call for Proposals on Video Compression Technology (HECV) covered 5 different classes of content, with resolutions ranging from WQVGA (416×240) to 2560×1600, in two configurations (low delay and random access) for different classes of target applications. It was a very large test effort because it was done on a total of of 29 submissions that lasted 4 months and involved 3 laboratories which assessed more than 5000 video files and hired more than 2000 non-expert viewers. The ranking of submissions was done considering the Mean Opinion Square (MOS) and Confidence Interval (CI) values. A procedure was introduced to check that the results provided by different test laboratories were consistent. The results of the three laboratories included a common test set that allowed to measure the impact of a laboratory on the results of a test experiment.

A total of 24 complete submissions were received in response to the Joint Call for Proposal on 3D Video Coding (stereo and auto-stereo) issued in 2012. For each test case each submission produced 24 files representing the different viewing angle. Two sets of two and three viewing angles were blindly selected to synthesise the stereo and auto-stereo test files. The test was done on standard 3D displays (with glasses) and auto stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non expert viewers. The test applied a full redundancy scheme, where each test case was run by two laboratories to increase the reliability and the accuracy of the results. The ranking of the submissions was done considering the MOS and CI values. This test represented a further improvement in the control of performances of each test laboratory. The test could ensure full result recovery in the case of failure of up to 6 out of 13 testing laboratories.

The Joint CfP for Coding of Screen Content was issued to extend the HEVC standard in order to improve the coding performance of typical computer screen content. Whent it became clear that the set of test conditions defined in the CfP was not suitable to obtain valuable results, the test method was modified from the original “side by side” scheme, to a sequential presentation scheme. The complexity of the test material led to the design of an extremely accurate and long training of the non-expert viewers. Four laboratories participated in the formal subjective assessment test, assessing and ranking the seven responses to the CfP. More than 30 test sessions were run (including the “dry-run” phase) hiring around 250 non-expert viewers.

The CfP on Point Cloud Coding was issued to assess coding technologies for 3D point coulds. MPEG had no experience (but actually no one had) in assessing the visual quality of point clouds. MPEG projected the 3D point clouds to 2D spaces and evaluated the resulting 2D video according to formal subjective assessment protocols. The video clips were produced using a rendering tool that generated two different video clips for each of the received submissions, under the same creation conditions. Both were rotating views of 1) a fixed synthesised image and 2) a moving synthesised video clips. The rotations were blindly selected.

The CfP for Video Compression with Capability beyond HEVC included three test categories, for which different test methods had to be designed. The Standard Dynamic Range category was a compression efficiency evaluation process where the classic DSIS test method was applied with good results. The High Dynamic Range category required two separate sessions, according to the peak luminance of the video content taken into account, i.e. below (or equal to) 1K nits and above 1K nits (namely 4K nits); in both cases DSIS test method was used. The quality of the 360° category was assessed in a “viewport” extracted from the whole 360° screen with an HD resolution.

When the test was completed, the design of the 36 “SDR”, 14 “HDR” and 8 “360°” test sessions was verified. For each test session the distribution of the raw quality scores assigned during each session was analysed to verify that the level of visual quality across the many test sessions was equally distributed.

This was a long but still incomplete review of 30 years of subjective visual quality in MPEG. This ride across 3 decades should demonstrate that MPEG draws from established knowledge to create new methods that are functional to obtain the resulst MPEG is seeking. It should also show the level of effort invovled in actually assigning task, coordinate the work and produce integrated results that provide the responses. Most important is the level of human participation involved: 2000 people (non experts) for the HEVC tests!

Posts in this thread

Developing MPEG standards in the viral pandemic age

Introduction

For 30 years industry has been accustomed to rely on MPEG as the source of standards the industry needs. In 30 years MPEG has held a record 129 meetings, roughly spaced by 3 months.

What happens if MPEG130 is not held? Can industry afford it?

In this article I will try and answer this non so hypothetical question.

An MPEG meeting (physical)

In Looking inside an MPEG meeting I have illustrated the “MPEG cycle” workflow using the figure below

At the plenary session of the previous N-1th meeting, MPEG approves the results achieved and creates some 25 Ad hoc Groups (AhGs). Taking one example from MPEG129, each AhG has a title (Compression of Neural Networks for Multimedia Content Description and Analysis), chairs (Werner Bailer, Sungmoon Chun and Wei Wang) and a set of mandates:

Collect more diverse types of models and test data for further use cases, working towards a CfP for incremental network representation
Perform the CEs and analyse the results
Improve the working draft and test model
Continue analyzing the state of the art in NN compression and exchange formats
Continue interaction with SC42, FG ML5G, NNEF, ONNX and the AI/ML community

Work is carried out during the typical ~3 months between the end of the N-1th and the next Nth meeting using e-mail reflector or conference calls or, less frequently, physical meetings. Documents are shared by AhG members using the MPEG Document Management System (MDMS).

When the date of the next meeting approaches, AhGs wrap up their conclusions and many of them hold physical meetings on the week-end prior to the “MPEG week”.

On the Monday morning of the MPEG week, AhGs report their results to the MPEG plenary. In the afternoon, subgroups (Requirements, Systems, Video, Joint groups with ITU, Audio, 3DG and Test) hold short plenaries after which Break-out Groups (BoGs), often a continuation of AhGs, carry out their work interspersed with joint meetings of subgroups and BoGs.

Two more plenaries are held: on Wednesday morning to make everybody aware of what has happened in groups a member might not have had the opportunity to attend and on Friday afternoon to ratify or, if necessary, reconsider, decisions made by the subgroups.

The Convenor and the Chairs meet at night to assess progress and coordinate work between subgroups and BoGs. A typical function is the identification of joint meetings.

ICT at the service of MPEG

Some 500 people are involved in an MPEG week, At times some 10-15 meeting sessions are held in parallel.

Most of this is possible because of the ICT facilities MPEG prides itself of. Developed by Christian Tulvan, they run on servers made available by Institut Mines Télécom.

Currently the MPEG Meeting Support System (MMSS) includes a calendar where subgroup chairs record all subgroup and BoG sessions adding a description of the topics to be discussed. The figure below gives a snapshot of the MMSS calendar. This of course has several views to serve different needs.

In Digging deeper in the MPEG work, I described MDMS and MMSS. Originally deployed in 1995, MDMS has been one of the greatest contributors to MPEG’s skyrocketing rise in performance. In addition to providing the calendar, MMSS also enables the collation of all results produced by the galaxy of MPEG organisational units depicted below.

The third ICT support is the MPEG Workplan Management System (MWMS). This provides different views of the relevant information on MPEG standards that is needed to execute the workplan.

MPEG online?

Now imagine, and probably you don’t have to stretch you imagination too much, that physical meetings of people are banned but industry requests are so pressing that a meeting must be held, no matter what, because product and service plans depend so much on MPEG standards.

MPEG is responding to this call of duty and is attempting the impossible by converting its 131^st (physical) meeting of 500 experts to a full online meeting retaining as much as possible the modus operandi depicted in the figures above.

In the following I will highlight how MPEG is facing what is probably its biggest organisational challenge ever.

The first issue to be considered is that, no matter how skilfully MPEG will handle its first online meeting, productivity is going to be less than a physical meeting could yield. This is because by and large the majority of the time of a physical MPEG meeting is dedicated to intense technical discussions in smaller (and sometimes not so small) groups. At an online meeting, such discussions will at best be a pale replica of the physical meeting where experts are pressed by the number and the complexity of the issues, the argument they make, the little time available, the need to get to a conclusion and a clumsier handling of the interventions.

MPEG is facing this challenge by asking AhGs to come to the online meeting with much more solid conclusions than usual so that the results that will be brought to the online meeting will be more mature and will require less debate to be adopted. This has generated a surge in conference calls by the groups who are more motivated by the need to achieve firm results at the next meeting.

Another way to face the challenge is by being realistic in what is achievable at an online meeting, Issues that are very complex and possibly less urgent will be handled with a lower priority or not considered at all, of course if the membership agrees. Therefore the management will set the meeting goals, balancing urgency, maturity and achievability of results. Of course experts, individually or via AhGs, will have an opportunity to make themselves heard.

Yet another way to face the challenge is by preparing a very detailed assignment of time slots to issues during the entire week in advance of the MPEG week. So far this was done only partially because MPEG allowed as much time as possible to experts to prepare and upload their contributions for others to study and to be ready to discuss at the meeting. This has always forced the chairs to prepare their schedule at the last minute or even during the week as the meeting unfolds. This time MPEG asks its experts to submit their contributions one full week before with an extended abstract to facilitate the task of the chairs who have to understand tens and sometimes hundreds of contributions and properly assign them to homogeneous sessions.

The schedules will balance the need to achieve as many results as possible (i.e. parallel sessions) with giving the opportunity to as many members as possible to attend (i.e. sequential sections).

The indefatigable Christian Tulvan, the mind and the arm behind MDMS and MMSS, is currently working to extend MMSS to enable the chairs to add the list of documents to be considered and to create online session reports shared with and possibly co-edited by session participants.

So far MPEG has been lenient most of the time to late contributions (accepted if there is consensus to review the contribution). This time late contributions will simply not be considered.

No matter how good the forecast will be, It is expected that the schedule will change while the week progresses. If a change during the meeting is needed, it will be announced at least 24 hours in advance.

The next big challenge is the fact that MPEG is a truly global organisation. We do not have Hawaiian experts in attendance, but we do have experts from Australia (East Coast) to the USA (West Coast). That makes a total of 19 time zones. Therefore MPEG130 online will be conducted in 3 time slots starting at 05:00, 13:00 and 21:00 (times are GMT). The sessions inside will have durations less than 2 hours followed by a break.

Conclusions

Last but not least. MPEG is confident that the current emergency will be called off soon. The situation we are facing, however, is new and we simply don’t know when it will be over and if it will be for once or if this is just the first of future pandemics.

With MPEG130 online, MPEG not only wants to respond to the current industry needs, but also to fine tune its processes in an online context to be always ready to serve the industry and the people industry serves, no matter which are the external circumstances.

I don’t underestimate the challenge MPEG is facing with MPEG130 online, but I know I can rely on a dedicated leadership and membership.

Posts in this thread

The impact of MPEG on the media industry

MPEG was established as an experts group on 1988/01/22 in Copenhagen, a likttle more that 32 tears ago. At that time, content media were already very important: voice communication; vinyl, compact cassettes, compact discs for audio; radio, mostly on terrestrial Hertzian channels; and television on 4 physical media: terrestrial Hertzian channels, satellite, cable and package media.

The way individual media evolved was a result of the technology adopted to represent content media and the way content media were distributed. Industry shared some elements of the technologies but each industry introduced many differences. The situation was further exacerbated by different choice made by different countries and regions, sometimes justified by the fact that some countries introduced a technology earlier (like 415 lines of UK TV before WW II and 525 lines od US TV some years later). In some other cases there was no justification at all.

The figure below represents the actors of 1988:

Two forms of wireless radio and television (terrestrial and satellite)
Wired radio and television (cable)
Physical distribution (package media)
Theatrical movies distribution
Content industry variously interconnected with the distribution industries.

The figure includes also two industries who, at that time, did not have an actual business in content distribution. Telecommunications was actively vying for a future role (although at that time some telcos were running cable television services as a separate business from telephony both as public services). The second industry was information technology. Few at that time expected that the internet protocol, an outcome of the information technology industry because it was designed to enable computers to communicate, would become the common means to transport media. However, eventually that is what it did.

The figure should be more articulated. Indeed it does not include manufacturers. At that time consumer electronics served users of the broadcasting service but broadcasting had their own manufacturing industry for the infrastructure. Consumer electronics was by itself the package media industry. Telcos had a manufacturing industry of their own for the infrastructure and a separate manufacturing industry for terminal devices, with some consumer electronics or office equipment companies providing facsimile terminals.

Even though it did not happen overnight, MPEG came, saw and unified. Today all the industries in the figure maintain a form of individual existence but they are much more integrated, as represented by the figure below.

Industry convergence has become a much abused word. However, it is true that standard and efficient digital media have enabled the industries to achieve enormous savings in moving to digital, and expanding from it, by allowing reuse of common components possibly form hitherto remote industries. A notable example is Media Transport (MMT) which provides the means to seamlessly move from one-way to two-way media distribution because IP is the underlying common protocol.

There is a net result from convergence that can be described as two points

Industry: MPEG-enabled products (devices) & services are worth 1.5 T$ p.a., i.e. ~1.8% Gross World Product
Consumers: Billions of consumers enjoy media every time and everywhere.

It would be silly to claim that this is a result for which MPEG is the only one to claim merit. There are many other standards bodies/committees who share in this result. The figure below shows some of them. It should be cleat, however, that, all started from MPEG while other bodies took over from where MPEG has left the technology.

Two words about the semantics of the figure. A black line without arrows signifies that MPEG is in liaison with the body. A black line with one arrow means that MPEG is providing or has provided standards to that body. A black line with two arrows means that the interchange is/has been two way. Finally a red line means that MPEG has actually developed standards with that body. The numbers refer to the number of jointly developed standards. The number after the + indicates the number of standards MPEG is currently developing jointly with that body.

Is there a reason why MPEG has succeeded? Probably more than one, but primarily I would like to mention one: MPEG has created standards for interoperability where industry used to develop standards for barriers. Was MPEG unique in its driving thoughts? No, it just applied the physiocratic principle “laissez faire, laissez passer” (let them do, let them pass), without any ideological connotation. Was MPEG unique in how it did it? Yes, because it first applied the principle to media standard. Was MPEG unique in its result? Yes. It created a largely homogeneous industry in what used to be scattered and compartmentalised industries.

It is easy to look at the success of the past. It is a nice exercise to do when you have reached the end of the path, but this is not the case of MPEG. Indeed MPEG has a big challenge: after it has done the impossible, people expects to do even better in the future. And MPEG has better not fail 🙁

The figure below depicts some of the challenges MPEG faces in the next few years.

A short explanation of the 8 areas of the figure:

Maintenance of ~180 standards is what MPEG needs to do primarily. Industry has adopted MPEG standards by the tens, but that is not the end point, that is the start. Industry continuously expresses needs that come from the application of MPEG standards it has adopted. These requests must be attended to.
Immersive media is one of the biggest challenges faced by MPEG. We all wish to have immersive experience like being physically here but feeling like we were at a different place subject to the experiences felt by those who are in that place. The challenges are immense. Addressing them requires a level on integration with the industry never seen before.
Media for old and new users conveys two notions. The first that “old” media are not going to die anytime soon. We will need conventional audio, good old 2D rectangular video and, even though it is hard to call them as “old media”, point clouds. These media are for human users, but we see the appearance of a new type of user – machines – that are going to make use of audio and visual information that has been transmitted from remote. This goal includes the current Video Coding for Machines (VCM) exploration.
Internet of Media Things is a standard that MPEG has already developed with the acronym IoMT. At this moment, however, this is more at the level of a basic infrastructure on which it will be possible to build support for such ambitious scenarios as Video Coding for Machines where media information is captured and processes by a network of machines assembled or built to achieve a predetermined goal.
Neural Network Compression (NNR) is another component of the same scenario. The current assumption is that in the future a lot, if not all, of the “traditional” processing, e.g. for feature extraction, will accomplished using neural network and that components of “intelligence” will be distributed to devices, e.g. handheld devices but also IoMTs, to enable them to be a better or a new job. NNR is at its infancy in MPEG and much more from it can be expected.
Genomic Data Compression has been shown to be viable by the MPEG-G standard. The notion of a single representation of a given type of data is a given in MPEG and has been the foundation of its success. That notion is alien to the genomic world where different data formats are applied at different portions of genomic workflows, but its application will have beneficial effects as much as it had to the media industry.
Other Data Compression is a vast field that includes all cases where data, possibly already in digital form, are currently handled in an inefficient way. Data compression is not important only because it reduces storage and transmission time/bandwidth requirements, but because it provides data in a structured form that is suitable for further processing. Exploring and handking these opportunities is a long-term effort and will certainly provide rewarding opportunities.
Finally, we should realise that, although MPEG holds the best compression and transport experts from the top academic and economic enterprises, we do not know the needs of all economic players. We should be constantly on alert, ready to detect the weak signal of today that will become mainstream tomorrow.

For as many years to come as it is possible to forecast today, industry and consumers will need MPEG standards.

Posts in this thread

MPEG standards, MPEG software and Open Source Software

Introduction

The MPEG trajectory is not the trajectory of an Information Technology (IT) group because MPEG it is not an “IT group”. Today software plays a key role in MPEG standard development. However, for MPEG, software is a vitally important tool to achieve the goal of producing excellent standards. But software remains a tool. Clearly, because MPEG assembles so many industries, with so many different agendas, there are certainly MPEG members for which software is more than a tool.

In this article I will explore MPEG’s relationship with software and, in particular, Open Source Software (OSS).

Early days

In my early professional days I had the opportunity to be part of an old-type ICT standardisation, the European COST 211 project. A video codec specification (actually more than that, because it contained Systems aspects as well) was later submitted to and became a Recommendation of CCITT (Today ITU-T) with the H.120 acronym and Codecs For Videoconferencing Using Primary Digital Group Transmission as title.

The specification was developed on the basis of contributions received, discussed, possibly amended and eventually added to the specification. There was no immediate “verification” of the effectiveness of adopted contributions because the group had to wait for hardware to be implement. But hardware hardware is (or at least was) a different beast than software. Four countries (DE, FR, IT and UK) implemented the specification that was eventually confirmed by field trials using 2 Mbit/s satellite links where the 4 implementations were shown to interoperate.

MPEG-1 and MPEG-2

That happened in the years around 1980. Ten years later, MPEG started the development of the MPEG-1 Video and then the MPEG-2 Video standards using a different method.

MPEG assembled the first MPEG-1 Video Simulation Model (VM) at MPEG10 (1990/03). The VM was sort of comparable with the H.120 evolving specification because it was a traditional textual description. At MPEG12 (1990/08), MPEG started complementing the text of the standard with pseudo C-code because people accustomed to write computer programs found code snippets more natural than words to describe the operations performed by a codec.

In MPEG-1 and MPEG-2 times active participants developed and maintained their own simulation software. Some time later, however, it was decided to develop reference software, i.e. a software implementation of the MPEG-1 standard. Because ISO only cared about the text of the standard, MPEG-1 and MPEG-2 reference software (the code) got lost. If anyone can trace back to people owning it, please contact me.

Seen with the eyes of a software developer, the process of standard development in MPEG-1 and MPEG-2 times was rather awkward, because the – temporally overlapping – sequence of steps was:

Produce a textual description of the standard
Translate the text to the individual software implementing the Simulation Model
Run the software, compare results and optimise the software
Translate the software back to text/pseudo C-code.

Reference Software and Conformance Testing

People of the early MPEG days used software – and as intensely as today – because that was a tool that cut by orders of magnitude the time it would take to develop the specification while offering the opportunity to obtain a standard with better performance.

Another important development was the notion of conformance testing. Separation of the specification (the law) from determination of conformance (the tribunal) was a major MPEG innovation. The reference software could be used to test an encoder implementation for conformance by feeding the bitstream produced by the implemented encoder to the reference decoder. Especially produced conformance testing bitstreams could be used to test a decoder for conformance.

Conformance testing and its reference software “tool” is an essential add-on to the standard because it gives users the freedom to make their own implementation and enables the creation of ecosystems of interoperable implementations.

Open Source Software (OSS)

OSS is a very large and impactful world for which the software is not the tool to achieve a goal but the goal itself. Those adopting the Gnu’s Not Unix (GNU) General Public License (GPL) grant some basic rights to users of their software they call “Free”. The terms can be (roughly) summarised as

Distribute copies of the software
Receive the software or get it
Change the software or use pieces of it in new programs

in exchange of a commitment of the user to

Give another recipient all the rights acquired
Make sure that recipients are able to receive or get the software
Recipients must be made aware that a software is a modification.

Two additional issues should be borne in mind:

There is no warranty for GNU license software
Any patent required to operate the software must be licensed to everybody.

MPEG-4

Development of the MPEG-4 Visual standard took another important turn, one that marked the convergence of the way telecom, broadcasting and consumer electronics on one side and information technology on the other side developed.

Unaware of the formalisation of the OSS rules that were already taking place in the general IT world, MPEG made the decision to develop the MPEG-4 reference software collaboratively because

Better reference software would be obtained
The scope of MPEG-4 was so large that probably no company could afford to develop the complete software implementation of the standard
A software implementation made available to the industry would accelerate adoption of the standard
A standard with two different forms of expression would have improved quality because the removal of an ambiguity from one form of expression would help clarify possible ambiguities in the other.

MPEG-4 Visual had only one person in charge of the Test Model. All new proposals were assessed and, if agreed, converted to Core Experiments. If at least two participants from two different institutions brought similarly convincing improvement results, the proposal would be accepted and added to the VM.

Therefore the MPEG-4 software was no longer just the tool to develop the standard, it became the tool to make products based on the standard, but not necessarily the only one. Therefore a reversal of priorities was required because the standard in textual form was still needed, but many users considered the standard expressed in a programming language as the real reference. This applied not just to those making software implementations, but often to those making more traditional hardware-based products and VLSI designs as well.

Therefore it was decided that the software version of the standard should have the same normative status as the textual part. This decision has been maintained in all subsequent MPEG standards.

Licensing software

While the previous approach where every participant had their own implementation of the TM did not raise the issue of “who owns the software?”, the new approach did. MPEG resolved that with the following rules labelled as “copyright disclaimer”:

Whoever makes a proposal that is accepted must provide a software implementation and assign the copyright of the code to ISO
ISO grants a license of the copyright of the code for products conforming to the standard
Proponents are not required to release patents that are needed to exercise the code and users should not expect that the copyright release includes a patent licence.

More recently, MPEG has started using a modified version of the Berkeley Software Distribution (BSD), a licence originally used to distribute a Unix-like operating system. This licence, originally called “MXM licence” from the name of MPEG-M part 2 standard “MPEG Extensible Middleware (MXM)” simply says that code may be used as prescribed by the BSD licence with the usual disclaimer that patents are not released. This new licence is particularly interesting for software companies that do not want to have liabilities when using software not developed internally.

MPEG and OSS are close, but not quite so

Let me summarise the main elements of what drives MPEG to develop textual standards and software that implements them:

MPEG develops the best standards satisfying identified requirements. Best standards must use technologies resulting from large R&D investments typically made by for-profit entities.
MPEG uses a competitive and transparent process to acquire (Call for Proposals) and refine (Core Experiment) technologies. Today that process largely uses collaboratively developed software, with methodologies that resemble those of the OSS community
Typically, an MPEG standard is available in two forms: The standard expressed in natural language (possibly with some pseudo C-code inside to improve clarity) and the standard expressed in computer code.
Both form of the standard have the same normative value. If discrepancies are found, the group will decide which is the correct form and amend the one not considered correct.
Reference Software and Conformance Testing are attached to MPEG standards. The former is used to test encoder implementations for conformance and the latter to test decoder implementations for conformance.
Users should not expect that reference software be “product level”, even though in some cases it is. Reference software is only required to correctly implement the textual part of the standard.
MPEG standards are typically Option 2, i.e., essential patents may exist but those patents can be used at FRAND terms.

Who is better?

The question is not meaningful if we do not specify the context in which the standard or software is used. The context is the scope of MPEG standards, not general standardisation.

I claim that only a standard that responds to the 7 drivers of the previous section can

Embed state-of-the-art technology
Be implemented by a variety of independent entities in different industries
Allow for incremental evolutions in response to user requirements
Stimulate the appearance of constantly improved implementations
Enable precise assignment of responsibilities (e.g. for legal purposes).

Conclusions

I am sure that some will think differently on the subject of the previous section and I will certainly be willing to engage in a discussion.

I believe that Open Source Software is a great movement that has brought a lot to humankind. However, I do not think that it is adequate to create an environment that respond to the 7 drivers in the context is the scope of MPEG standards.

Posts in this thread

The MPEG Metamorphoses

Introduction

In past publications, I have often talked about how many times MPEG has changed its skin during its 3-decade long life. In this article I would like to add substance to this claim by giving a rather complete, albeit succinct, account. You can find a more detailed story at Riding the Media Bits.

The early years

MPEG-1

MPEG started with the idea of creating a video coding standard for interactive video on compact disc (CD). The idea of opening another route to video coding standards had become an obsession to me because I had been working for many yeas in video coding research without seeing ant trace of consumer-level devices for what was touted as the killing application at that time: video telephony. I thought that if the manufacturing prowess of the Consumer Electronics (CE) industry could be exploited, that industry could supply telco customers with those devices so that telcos would be pushed into upgrading their networks to digital in order to withstand the expected high videophone traffic.

The net bitstream from CD – 1.4 Mbit/s – is close to the 1.544 Mbit/s of the primary digital multiplex in USA and Japan. Therefore it was natural to set a target bitrate of 1.5 Mbit/s as a token of the CE and telco convergence (at video terminal level).

At MPEG1 (1988/05) 29 experts attended. The work plan was agreed to be MPEG-1 at 1-1.5 Mbit/s, MPEG-2 and 1.5-10 Mbit/s and MPEG-3 at 10-60 Mbit/s (the numbering of standards came later).

For six months all activities happened in single sessions. However, 3 areas were singled out for specific activities: quality assessment (Test), complexity issues in implementing video codecs in silicon (VLSI) and characteristics of digital storage media (DSM) . The last activity was needed because CD was a type of medium quite dissimilar from telecom networks and broadcast channels, for which video coding experts did not have much familiarity.

In the following months I dedicated my efforts to quell another obsession of mine: humans do not generally value video without audio. The experience of the ISDN videophone where, because of organisational reasons, video was compressed by 3 orders of magnitude in 64 kbit/s and audio was kept uncompressed in another 64 kbit/s stream, pushed me into creating an MPEG subgroup dedicated to Audio coding. Audio, however, was not the speech used in videotelephony (for which there were plenty of experts in ITU-T), but the audio (music) typically recorded on CDs. Therefore an action was required lest MPEG end up like videoconference, with a state-of-the-art video compression standard but no audio (music) or with a quality non satisfactory for the target “entertainment-level” service.

The Audio subgroup was established at MPEG4 (1988/10) under the chairmanship of Hans Mussmann, just 7 months after MPEG1, while the Video subgroup was established at MPEG7 (1989/07), under the chairmanship of Didier Le Gall, about a year after MPEG1.

The other concern of mine was that integrating the audio component in a system that had not been designed for that could lead to some technical oversights that could be only belatedly corrected with some abominable hacks. Hence the idea of a “Systems” activity, initially similar to the H.221 function of the ISDN videophone (a traditional frame and multiframe-based multiplexer), but with a better performance because I expected it to be more technically forward looking.

At MPEG8 (1989/11) all informal activities were formalised into subgroups: Test (Tsuneyoshi Hidaka), DSM (Takuyo Kogure), Systems (Al Simon) and VLSI (Colin Smith).

MPEG-2

Discussions on what would eventually become the MPEG-2 standard started at MPEG11 (1990/07). The scope of the still ongoing MPEG-1 project was nothing, compared to the ambitions of the MPEG-2 project. The goal of MPEG-2 was to provide a standard that would enable the cable, terrestrial TV, satellite television, telcos and the package media industries – worth in total hundreds of billion USD, to go digital in compressed form.

Therefore, at MPEG12 (1990/09) the Requirements group was established under the chairmanship of Sakae Okubo, the rapporteur of the ITU-T Specialists Group on Coding for Visual Telephony. This signalled the fact that MPEG-2 Video (and Systems) were joint projects. The mandate of the Requirements Group was to distil the requirements coming from the different industries into one coordinated set of requirements.

The Audio and Video subgroup had their minds split in two with one half engaged in finishing their MPEG-1 standards and the other half in initiating the work on the next MPEG-2 standard. This was just the first time MPEG subgroups had to split their minds.

In those early years subgroup chairs changed rather frequently. At MPEG9 (1990/02) Colin (VLSI) was replaced by Geoff. Morrison and the name of the group was changed to Implementation study Group (ISG) to signal the fact that not only hardware implementation was considered, but software implementation as well. At MPEG12 (1990/03) Al (Systems) was replaced by Sandy MacInnis and Hans (Audio) was replaced by Peter Noll.

MPEG29 (1994/11) approved the Systems, Video and Audio parts of the MPEG-2 standard and some of the subgroup chairs saw their mission as an accomplished one. The first move was at MPEG28 (1994/07) when Sandy (Systems) was replaced by Jan van der Meer to finalise the issues left over from MPEG-2.

The MPEG subgroups did a great job in finishing several pending MPEG-2 activities such as MPEG-2 Video Multiview and 4:2:2 profiles, MPEG-2 AAC, DSM-CC and more.

A new skin of coding

In the early years 1990s, MPEG-1 was not finished and MPEG-2 had barely started but talks about a new video coding standard for very low bitrate (e.g. 10 kbit/s) were already under way. The name eventually assigned to the project was MPEG-4, because the MPEG-3 standard envisaged at MPEG1 had been merged with MPEG-2 by bringing the upper bound of the bitrate range to 10 Mbit/s .

MPEG-4, whose title eventually settled to Coding of Audio-Visual Objects, was a completely different standard from the preceding two in that it aimed at integrating the world of audio and video, so far under the purview of broadcasting, CE and telecommunication, with the world of 3D Graphics, definitely within the purview of the Information Technology (IT) industry.

At MPEG20 (1992/11) a new subgroup called Applications and Operational Environments (AOE) was established under the chairmanship of Cliff Reader. This group took charge of developing the requirements for the new MPEG-4 project and spawned three groups inside it: “MPEG-4 requirements”, “Synthetic and Natural Hybrid Coding (SNHC) and “MPEG-4 Systems”.

The transition from the “old MPEG” (MPEG-1 and MPEG-2) and the “new MPEG” (MPEG-4) was quite laborious with many organisational and personnel changes. At MPEG30 Didier (Video) was replaced by Thomas Sikora and Peter (Audio) was replaced by Peter Schreiner At MPEG32 Geoff (ISG) was replaced by Paul Fellows and Tsuneyoshi (Test) was replaced by Laura Contin.

MPEG-4 Visual was successfully concluded thanks to the great efforts of Thomas (Video) and Laura (Test) and the very wide participation by experts. The foundations of the extremely successful AAC standards were laid down by Peter (Audio) and the Audio subgroup experts.

At MPEG34 (1996/03) C. Reader left MPEG and at MPEG35 (1996/07) a major reorganisation took place:

The “AOE requirements” activity was mapped to the Requirements subgroup under the chairmanship of Rob Koenen, after a hiatus of 3 meeting after Sakae (Requirements) had left.
The “AOE systems” activity was mapped to the Systems subgroup under the chairmanship of Olivier Avaro.
The “AOE SNHC” activity became a new SNHC subgroup under the chairmanship of Peter Doenges. Peter was replaced by Euee Jang at MPEG49 (1999/10).

At MPEG 40 (1997/07) a DSM activity became a new subgroup with the name Delivery Multimedia Integration Framework (DMIF) under the chairmanship of Vahe Balabanian. DMIF addressed the problem of virtualising the distribution medium (broadcast, network and storage) from the Systems level by defining appropriate interfaces (API). At MPEG 47 (1999/03) Guido Franceschini took over with a 2 meeting tenure after which the DMIF subgroups was closed (1999/07).

At MPEG41 Peter (Audio) was replaced by Schuyler Quackenbush who since then has been running the Audio group for 23 years and is the longest-serving MPEG chair.

At MPEG46 (1998/12) Paul (ISG) was replaced by Marco Mattavelli. Under Marco’s tenure, such standards as MPEG-4 Reference hardware description, an extension to VHDL of the notion of Reference Software, and Reconfigurable Media Coding were developed.

The MPEG-4 standard is unique in MPEG history. MPEG-1 and -2 were great standards because they brought together establish large industries with completely different agendas, but MPEG-4 is the standard that bonded together the initial MPEG industries with the IT industry. The standard had big challenges and Chairs and experts dedicated enormous resources to the project to face them: video objects, audio objects, synthetic audio and video, VRML extensions, file format and more. MPEG-4 is a lively standard even today almost 30 years after we first started working on it and has the largest number of parts.

Liaisons

At MPEG33 (1996/01) the Liaison subgroup was created under the chairmanship of Barry Haskell to handle the growing network of organisations MPEG was liaising with (~50). At MPEG56 Barry, a veteran of the video coding old guard, left MPEG and at MPEG57 (2001/07) Jan Bormans took over and continued until MPEG71 (2005/01) when Kate Grant took over. The Liaison subgroup was closed at MPEG84 (2008/04). Today liaisons are coordinated at Chairs meeting, drafted by the relevant subgroup and reviewed by the plenary.

An early skin change

In 1996 MPEG started addressing MPEG-7, a media-related standard but with a completely different nature than the preceding three: it was about media description and their efficient compression. At MPEG48 (1999/07) it became clear that we needed a new subgroup that was called Multimedia Description Schemes (MDS) to carry out part of the work.

Philippe Salembier was put in charge of the MDS subgroup who was initially in charge of all MPEG-7 matters that did not involve Systems, Video and Audio. At MPEG 56 (2001/03) John Smith took over the position which he held until MPEG70 (2004/10) when Ian Burnett took over until the MDS group was closed at MPEG87 (2009/02).

The media description skin has had several revivals since then. One is Part 13 – Compact Descriptors for Visual Search (CDVS) standard in the first half of the 2010. Another is Part 15 – Compact Descriptors for Video Analysis (CDVA) standard developed in the middle-to-second half of the 2010. Finally Part 17 – Compression of neural networks for multimedia content description and analysis is preparing a basic compression technology for neural network-based media description.

Another video coding

At MPEG46 (1998/12) Laura (Test) was replaced by Vittorio Baroncini. At MPEG54 (2000/10) Thomas (Video) left MPEG and at MPEG56 (2001/03) Jens-Rainer Ohm was appointed as Video chair.

Vittorio brought the expertise to carry out the subjective tests required by he collaboration with ITU-T SG 16 restarted to develop the Advanced Video Coding (AVC) standard. At MPEG58 (2001/12) Jens was appointed as co-chair of a joint subgroup with ITU-T called Joint Video Team (JVT). The other co-chair was Gary Sullivan, rapporteur of the ITU-T SG 16 Video Coding Experts Group (VCEG). The JVT continued its work until well after the AVC standard was released at MPEG 64 (2003/03). Since then Gary has attended the chairs meetings as a token of the collaboration between the two groups.

Still media-related, but a different “coding”

At MPEG49 (1999/10) the many inputs received from the market prompted me to propose that MPEG develop a new standard with the following vision: “Every human is potentially an element of a network involving billions of content providers, value adders, packagers, service providers, resellers, consumers …”.

The standard was eventually called MPEG-21 Multimedia Framework. MPEG-21 can be described as the “suite of standards that enable media ecommerce”.

The MDS subgroup was largely in charge of this project which continued during the first decade of the 2000s with occasional revivals afterwards. Today MPEG-21 standards are handled by the Systems subgroup.

Under the same heading of “different coding” it is important to mention Open Font Format (OFF), a standard built on the request made by Adobe, Apple and Microsoft to maintain the OpenType specification. The word maintenance” in MPEG has a different meaning because OFF has had many extensions, developed “outside” MPEG in an open ad hoc group with strong industry participation and ratified by MPEG.

A standard of standards

In the early year 2000s MPEG could look back at its first decade and a half of operation with satisfaction: its standards covered video, audio and 3D Graphics coding, systems aspects, transport (MPEG-2 TS and MPEG-4 File Format) and more. While refinements on its already impressive assets were under way, MPEG wondered whether there were other areas it could cover. The answer was: the coding of “combinations of MPEG coded media”. That was the beginning of a long series of 20 standards originally developed by the groups in charge of the individual media, e.g. Part 2 – MPEG music player application format was developed by the Audio subgroup and Part 3 – MPEG photo player application format was developed by the Video subgroup. Today all MPEG-A standard, e.g. the very successful Part 19 – Common Media Application Format, are developed by the Systems subgroup.

The mid 2000s

Around the mid 2000s MPEG felt that there was still a need for more Systems, Video and Audio standards, but did not have the usual Systems, Video and Audio “triad” umbrella it had had until then with MPEG-1, -2, -4 and -7. So it decided to create containers for those standards and called them MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio).

MPEG also ventured in new areas:

Specification of a media device software stack (MPEG-E)
Communication with and between virtual worlds (MPEG-V)
Multimedia service platform technologies (MPEG-M)
Rich media user interfaces (MPEG-U)

Rob (Requirements) continued until MPEG58 (2001/12). He was replaced by Fernando Pereira until MPEG64 (2003/04) when Rob returned, holding his position until MPEG71 (2005/01) when Fernando took over again until MPEG82 (2007/10) when he left MPEG.

The Requirements subgroup is the “control board” of MPEG in the sense that Requirements gives proposals of standards the shape that will be implemented by the operational group after the Call for Proposals. Therefore the duo Rob-Fernando have been in the control room of MPEG for some 40% of MPEG life.

Vittorio (Test) continued until MPEG68 (2004/03) when he was replaced by T. Oelbaum who held the positions until MPEG81 (2007/07).

Olivier (Systems) kept his position until MPEG86 (2008/07) when he left MPEG to pursue his entrepreneurial ambitions. Olivier has been in charge of the infrastructure that keeps MPEG standards together for 13 years and is the third longest-serving MPEG chair.

Euee (SNHC) kept his position until MPEG59 (2002/03). He was replaced by M. Bourges-Sévenier who continued until MPEG70 (2004/10). Mikaël was then replaced by Mahnjin Han who continued until MPEG78 (2006/10). The SNHC subgroup has been producing valuable standards. However, they have had a hard time penetrating an industry that is content with less performing but freely-available standards.

The return of the triad

The end of the years 2000s signaled a major change in MPEG. When Fernando (Requirements) left MPEG at MPEG82 (2007/10), the task of developing requirements was first assigned to the individual groups. The experiment lasted 4 meetings but it demonstrated that it was not the right solution. Therefore, Jörn Ostermann was appointed as Requirements chair at MPEG87 (2009/02). That was just in time for the handling of the requirements of the new Audio-Video-Systems triad-based MPEG-H standard.

MPEG-H included the MPEG Media Transport (MMT) part, the video coding standard that eventually became High Efficiency Video Coding (HEVC) and 3D Audio. MPEG-H was adopted by thw ATSC as a tool to implement new forms of broadcasting services where traditional broadcasting and internet not only coexist but cooperate.

The Requirements, and then the Systems subgroups were also quickly overloaded by the other project called DASH aiming at “taming” the internet from an unreliable transport to one the end user device could adapt to.

The two Systems projects – MMT and DASH – were managed by Youngkwon Lim who took over from Olivier at MPEG86 (2008/10).

At MPEG87 (2009/01) the MDS subgroup was closed. At the same meeting, Vittorio resumed his role as chair of the Test subgroup, about on time for the new round of subjective tests for the HEVC Call for Evidence and Call for Proposals.

The Joint Collaborative Team on Video Coding between ITU-T and MPEG (JCT-VC) was established at MPEG92 (200/04) co-chaired by Gary and Jens as in the AVC project. At its peak, the VC group was very large and processed in excess of 1,000 documents per meeting. When the group was still busy developing the main (2D video coding) part of HEVC, 3D video coding became important and a new subgroup called JCT-3V (joint with ITU-T) was established at MPEG100. The 3V subgroup closed its activities at MPEG115 (2016/05), while the VC subgroup is still active, mostly in maintenance mode.

The recent years

In the first half of the years 2010 MPEG developed the Augmented Reality Application Format and developed the Mixed and Augmented Reality (MAR) Reference Model in a joint ad hoc group with SC 24/WG 9.

In 2016 MPEG kicked off the work on MPEG-I – Coded representation of immersive media. Part 3 of this is Versatile Video Coding (VVC), the latest video coding standard developed by the new Joint Video Experts Team (JVET) between ITU-T and MPEG established at MPEG114 (2016/02). It is expected to become FDIS at MPEG131 (2020/06).

The JVET co-chairs are again Jens and Gary. In the, regularly materialised, anticipation that JVET would be again overloaded by contributions, Jens was replaced as Video chair by Lu Yu at MPEG 121 (2018/01).

The Video subgroup is currently engaged in two 2D video coding standards of rather different nature Essential Video Coding (EVC and Low Complexity Enhancement Video Coding (LCEVC) and is working on the MPEG Immersive Video (MIV) project due to become FDIS at MPEG134 (2021/03).

MIV is connected with another exciting area that in this article we left with the name of SNHC under the chairmanship of Mahnjin. At MPEG79 (2007/01) Marius Preda took over SNHC from Mahnjin to continue the traditional SNHC activities. At MPEG89 (2009/06) SNHC was renamed 3D Graphics (3DG).

In the mid 2010 the 3DG subgroup started several explorations, in particular Point Cloud Compression (PCC) and Internet of Media Things (IoMT). The former has split into two standards Video-based (V-PCC) and Graphics-based (G-PCC). The latter has reached FDIS recently.

nother promising activity started at MPEG109 (2014/03) and has now become the Genomic Information Representation (MPEG-G) standard. This standard signals the intention to bring the benefits of compression to industries other than media who process other data types.

Conclusions

This article was a long overview of 32 years of MPEG life. The intention was not to talk about MPEG standards, but about how the MPEG organisation morphed to suit the needs of standardisation.

Of course, structure without people is nothing. It was not obviously possible to mention the thousands of experts who made MPEG standards, but I thought that it was my duty to record the names of subgroup chairs who drove their development. You can see a complete table of all meetings and MPEG Chairs here.

In recent years the MPEG structure has remained stable, but there is always room for improvements. However, this must be driven by needs, noth by ideology.

One possible improvement is to make the Genomic data coding activity a formal subgroup as a first step in anticipation of more standards to code other non-media data. The other is to inject more market awareness into the phase that defines the existence first and then the characteristics of MPEG standards.

But this is definitely another story.

Posts in this thread

National interests, international standards and MPEG

Having spent a considerable amount of my time in standardisation, I have developed my own definition of standard: “the documented agreement reached by a group of individuals who recognise the advantage of all doing certain things in an agreed way”. Indeed, I believe that, if we exclude some areas such as safety, in matters of standards the authority principle should not hold. Forcing free people to do things against their interest, is an impossible endeavour. If doing certain things in a certain way is not convenient, people will shun a standard even if it bears the most august credentials.

Medieval Europe was a place where my definition of standard reached an atomic level. However, with the birth of national centralised states and, later, the industrial revolution, national standards came to the fore. Oddly enough, national standards institutions such as the British Standards Institute (BSI), originally called Engineering Standards Committee and probably the first of its kind, were established just before World War I, when the first instance of modern globalisation took shape.

Over the years, national standards became a powerful instrument to further a country’s industrial and commercial interests. As late as 1989 MPEG had trouble displaying 625/50 video coding simulation results at a USA venue because import of 625/50 TV sets in the country was forbidden at that time (and no one had an interest in making such sets). This “protection of national interests” is the cause of the 33 pages of the ITU-R Report 624 – Characteristics of television systems of 1990 available here containing tables and descriptions of the different analogue television systems used at the time by the 193 countries of the United Nations.

The same spirit of “protecting national interests” informed the CCITT SGXV WG4 Specialists Group on Coding for Visual Telephony (that everybody at that time called the Okubo group) when it defined the Common Intermediate Format (CIF) in Recommendation H.261 to make it possible for a 525/60 camera to communicate to a a 625/50 monitor (and between a 625/50 camera and a 525/60 monitor).

That solution was a “compromise” video format (actually not a real video format because it was used only inside the video codec) with one quarter of the 625/50 spatial resolution and one half the 525/60 temporal resolution. This was a typical political solution of the time (and one that 525/60 people later regretted because the spatial interpolation required by CIF was more onerous than the temporal interpolation in 625/50). Everybody (but me, who opposed the solution) felt happy because everybody had to “share the burden” when communicating across regions with different video formats.

International Standardisation is split in 3 – IEC, ISO and ITU – but IEC and ISO share the principle that standards for a technical area are developed by a Technical Committee (or a Subcommittee) managed by an international Secretariat funded and manned by a national standards organisation (so called National Body). Things in ITU are slightly different because ITU itself provides the secretariat whose personnel is provided by national administrations.

In the traditional context of standards being established by a national standards committee to protect the national interest, an international standards committee was seen as the place where national interests, as represented by their national standards bodies, had to be protected. Therefore, holding the secretariat of a committee was seen as a major achievement for the country that ran the secretariat. As an emblem of the achievement, the country had the right to nominate (in practice, appoint) the chairperson of the committee (in some committees this is rigorously enforced. In some others, things are taken more lightly).

That was then, but actually it is still so even now in many standardisation contexts. The case of CIF mentioned above shows that, in the area of video coding standards, then the prerogative of the ITU-T “for Visual Telephony”, the influence of national interests was still strong. MPEG, however, changed the sides of the equation. One of the first things that it did when it developed MPEG-1 Video was to define test sequences that were both 525/60 and 625/50 and then issued a Call for Proposals where respondents could submit coded sequences in one or the other format at their choice. MPEG did not use CIF but SIF, where the format was either a quarter of the spatial resolution and one half of the temporal resolution of 525/60 (i.e. 288 lines x 352 pixels) or a quarter of the spatial resolution and one half of the temporal resolution of 625/50 (i.e. 240 lines x 352 pixels).

By systematically defusing political issues and converting them to technical issues, MPEG succeeded in the impossible task of defining compressed media formats with an international scope. However, by kicking political issues out of the meeting rooms, MPEG changed the nature and role of the parent subcommittee SC 29’s chairmen and secretariat. The first yearly SC 29 plenary meetings lasted 3 days, but later the duration was reduced to 1 day and in some cases inhalf a day alla matters were handled.

One of the most contentious areas of standardisation (remember the epic battles on the HDTV production format of 1986 and before) was completely tamed and reduced to technical battles where experts assess the quality of the solution proposed and not how it is dressed in political clothing. This does not mean that the battles are not epic, but for sure they are rational.

I do not remember having heard complaints on the part of the industry regarding the de-politicised state of affairs in media coding standardisation. Therefore it is time to ask if we should not be dispensed from the pompous ritual of countries expressing national interests through national bodies in secretariats and chairs of international standards committees when in fact there are global industrial interests poorly mapped through a network of countries actually deprived of national interests.

Posts in this thread

Media, linked media and applications

Introduction

In a technology space moving at an accelerated pace like the one MPEG has the task to develop standards for, it is difficult to have a clear plan for the future (MPEG has a 5-year plan, though).

Still, when MPEG was developing the Multimedia Linking Application Format (MLAF), it “discovered” that it had developed or was developing several standards – MPEG-7, Compact descriptors for visual search (CDVS), Compact descriptors for video analysis (CDVA) and Media Orchestration.

The collection of these standards (and of others in the early phases of conception or development, e.g. Neural Network Compression and Video Coding for Machines) that help create the Multimedia Linking Environment, i.e. an environment where it is possible to create a link between a given spatio-temporal region of a media object and spatio-temporal regions in other media objects.

This article explains the benfits brought by the MLAF “multimedia linking” standard also for very concrete applications.

Multimedia Linking Environment

Until a quarter of century ago, virtually the only device that could establish relationships between different media items was the brain. A very poor substitute was a note on a book to record a possible relationship of the place in the book where the note was written with content in the same or different books.

The possibility to link a place in a web page to another place in another web page, or to a media object, was the great innovation brought by the web. However, a quarter of century after a billion web sites and quadrillions of linked web pages, we must recognise that the notion of linking is pervasive one and not necessarily connected with the web.

MPEG has dedicated significant resources to the problem described by the sentence “I have a media object and I want to know which other related media objects exist in a multimedia data base” and represented in the MPEG-7 model depicted the figure below.

However, MPEG-7 is an instance of the more general problem of linking a given spatio-temporal region of a media object to spatio-temporal regions in other media objects.

These are some examples:

A synthetic object is created out of a number of pictures of an object. There is a relationship between the pictures and the synthetic object;
There is a virtual replica of a physical place. There is a relationship between the physical place and the virtual replica;
A User is experiencing as virtual place in a virtual reality application. There is a relationship between the two virtual places;
A user creates a media object by mashing up a set of media items coming from different sources. There is a relationship between the media items and the mashed up media object.

MPEG has produced MPEG-A part 16 (Media Linking Application Format – MLAF) specifies a data format called bridget that can be used to link any kinds of media. MPEG has also developed a number of standards that play an auxiliary role in the “ media linking” context outlined by the examples above.

MPEG-7 parts 1 (Systems), 3 (Visual), 4 (Audio) and 5 (Multimedia) provide the systems elements, and the visual (image and video), audio and multimedia descriptions.
MPEG-7 parts 13 (Compact descriptors for visual search) and 15 (Compact descriptors for video analysis) provide new generation image and video descriptors
MPEG-B part 13 (Media Orchestration) provides the means to mash up media items and other data to create personal user experiences.

The MLAF standard

A bridget is a link between a “source” content and a “destination” content. It contains information on

The source and the destination content
The link between the two
The information in the bridget is presented to the users who consume the source content.

The last information is the most relevant to the users because it is the one that enables them to decide whether the destination content is of interest to them.

The structure of the MLAF representation (points 1 and 2) is based on the MPEG-21 Digital Item Container implemented as a specialised MPEG-21 Annotation. The spatio-temporal scope is represented by the expressive power of two MPEG-7 tools and the general descriptive capability of the MPEG-21 Digital Item. They allow a bridget author to specify a wide range of possible associations and to be as precise and granular as needed.

The native format to present bridget information is based on MPEG-4 Scene description and application engine. Nevertheless, a bridget can be directly linked to any external presentation resource (e.g., an HTML page, an SVG graphics or others).

Bridgets for companion screen content

An interesting application of the MLAF standard is described in the figure below describing the entire bridget workflow.

1. A TV program, scheduled to be broadcast at a future time, is uploaded to the broadcast server [1] and to the bridget Authoring Tool (BAT) [2].
2. BAT computes and stores the program’s audio fingerprints to the Audio Fingerprint Server (AFS) [3].
3. The bridget editor uses BAT to create bridgets [4].
4. When the editor is done all bridgets of the program and the referenced media objects are uploaded to the Publishing Server [5].
5. At the scheduled time, the TV program is broadcast [6].
6. The end user’s app computes the audio fingerprint and sends it to the Audio Fingerprint Server [7].
7. AFS sends to the user’s app ID and time of the program the user is watching [8].
8. When the app alerts the user that a bridget is available, the viewer may decide to
  1. Turn his eyes away from the TV set to her handset
  2. Play the content in the bridget [9]
  3. Share the bridget to a social network [10].
This is the workflow of a recorded TV program. A similar scenario can be implemented for live programs. In this case bridgets must be prepared in advance so that the publisher can select and broadcast a specific bridget when needed.

Standards are powerful tools that facilitate the introduction of new services, such as companion screen content. In this example, the bridget standard can stimulate the creation of independent authoring tools and end-user applications.

Creating bridgets

The bridget creation workflow depends on the types of media object the bridget represents.

Let’s assume that the bridget contains different media types such as an image, a textual description, an independently selectable sound track (e.g. an ad) and a video. Let’s also assume that the layout of the bridget has been produced beforehand.

This is the sequence of steps performed by the bridget editor:
1. Select a time segment on the TV program timeline and a suitable layout
2. Enter the appropriate text
3. Provide a reference image (possibly taken from the video itself)
4. Find a suitable image by using an automatic images search tool (e.g. based on the CDVS standard)
5. Provide a reference video clip (possibly taken from the video itself)
6. Find a suitable video clip, possibly taken from the video itself, by using an automatic video search tool (e.g. based on the CDVA standard)
7. Add an audio file.
The resulting bridget will appears to the end user like this.

When all bridgets are created, the editor saves the bridgets and the media to the publishing server.

It is clear that the “success” of a bridget (in terms of number of users who open it) depends to a large extent on how the bridget is presented.

Why bridgets

Bridget was the title of a research project funded by the 7th Framework Research Program of the European Commission. The MLAF standard (ISO/IEC 23000-16) was developped at the instigation and with participation of members of the Bridget project.

At this page you will find more information on how the TVBridge application can be used to create, publish and consume bridgets for recorded and live TV programs.

Posts in this thread

Standards and quality

Introduction

Quality pervades our life: we talk of quality of life and we choose things on the basis of declared or perceived quality.

A standard is a product, and as such may also be judged, although not exclusively, in terms of its quality. MPEG standards are no exception and the quality of MPEG standards has been a feature has considered of paramount importance since its early days.

Cosmesis is related to quality, but is a different beast. You can apply cosmesis at the end of a process, but that will not give quality to a product issued from that process. Quality must be an integral part of the process or not at all.

In this article I will describe how MPEG has embedded quality in all phases of its standard development process and how it has measured quality in some illustrative cases.

Quality in the MPEG process

The business of MPEG is to produce standards that process information in such a way that users do not notice, or notice in as a reduced ways as possible, the effect of that standard processing when implemented in a product or service.

When MPEG considers the development of a new standard, it defines the objective of the standard (say, compression of video of a particular range of resolutions), range of bitrates and functionality. Typically, MPEG makes sure that it can deliver the standard with the agreed functionality by issuing a Call for Evidence (CfE). Industry members are requested to provide evidence that their technology is capable to achieve part of all the identified requirements.

Quality is now an important, if not essential, parameter for making a go-no go decision. When MPEG assesses the CfE submissions, it may happen that established quality assessment procedures are found inadequate. That was the case of the call for evidence on High-Performance Video Coding (HVC) of 2009. The high number of submissions received required the design of a new test procedure: the Expert Viewing Protocol (EVP). Later on the EVP test method became ITU recommendation ITU-R BT-2095. While the execution of any other ITU recommendation of that time would require more than three weeks, the EVP allowed the complete testing of all the submissions in three days.

If MPEG has become confident of the feasibility of the new standard from the results of the CfE, a Call for Proposals (CfP) is issued with attached requirements. These can be considered as the terms of the contract that MPEG stipulates with its client industries.

Testing of CfP submissions allows MPEG to develop a Test Model and initiate Core Experiments (CE). These aim to achieve optimisation of a part of the entire scheme.

In most cases the result of CEs involves quality evaluation. In the case of CfP responses subjective testing is necessary because there are typically large differences between the different coding technologies proposed. However, in the assessment of CE results where smaller effects are involved, , objective metrics are typically, but not exclusively, used because formal subjective testing is not feasible for logistic or cost reasons.

When the development of the standard is completed MPEG engages in the process called Verification Tests which will produce a publicly available report. This can be considered as the proof on the part of the supplier (MPEG) that the terms of the contract with its customer have been satisfied.

Samples of MPEG quality assessment

MPEG-1 Video CfP

The first MPEG CfP quality tests were carried out at the JVC Research Center in Kurihama (JP) in November 1989. 15 proposals of video coding algorithms operating at a maximum bitrate of 1.5 Mbit/s were tested and used to create the first Test Model at the following Eindhoven meeting in February 1990 (see the Press Release).

MPEG-2 Advanced Audio Coding (AAC)

In February 1998 the Verification Test allowed MPEG to conclude that “when auditioning using loudspeakers, AAC coding according to the ISO/IEC 13818-7 standard gives a level of stereo performance superior to that given by MPEG-1 Layer II and Layer III coders” (see the Verification Test Report). This showed that the goal of high audio quality at 64 kbps per channel for MPEG-2 AAC had been achieved.

Of course that was “just” MPEG-2 AAC with no substantial encoder optimisation. More that 20 years of MPEG-4 AAC progress has brought down the bitrate per channel.

MPEG-4 Advanced Video Coding (AVC) 3D Video Coding CfP

The CfP for new 3D (stereo & auto-stereo) technologies was issued in 2012 and received a total of 24 complete submissions. Each submission produced 24 files representing the different viewing angle for each test case. Two sets of two and three viewing angles were blindly selected and used to synthesise the stereo and auto-stereo test files.

The test was carried out on standard 3D displays with glasses and auto-stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non-expert viewers. Each test case was run by two laboratories making it a full redundant test.

MPEG-High Efficiency Video Coding (HEVC) CfP

The HEVC CfP covered 5 different classes of content covering resolutions from WQVGA (416×240) up to 2560×1600. For the first time MPEG introduced two set of constrains (low delay and random access) for different classes of target applications.

The HEVC CfP was a milestone because it requested the biggest ever testing effort performed by any laboratory or group of laboratories until then. The CfP generated a total of 29 submissions and 4205 coded video files plus the set of anchor coded files. Three testing laboratories took part in the tests that lasted four months and involved around 1000 naïve (non-expert) subjects allocated to a total of 134 test sessions.

A common test set of about 10% of the total testing effort was included to monitor the consistency of results from the different laboratories. With this procedure it was possible to detect a set of low quality test results from one laboratory.

Point Cloud Compression (PCC) CfP

The CfP was issued to assess how a proposed PCC technology could provide some 2D representations of the content synthesised using PCC techniques, resulting in some video suitable for evaluation by means of established subjective assessment protocols.

Some video clips for each of the received submissions were produced after an accurate selection of the rendering conditions. The video clips were generated using a rendering video tools. This was used to generate, under the same conditions, two different video clips for each of the received submissions: a rotating view of a fixed synthesised image and a rotating view of moving synthesised video clips. The rotations were selected in a blind way and the resulting video clips were subjectively assessed to rank the submissions.

Conclusions

Quality is what end users of media standards value as the most important feature. To respond to this requirements, MPEG has designed a standards development process that is permeated by quality considerations.

MPEG has no resources of its own. Therefore, sometimes it has to rely on the voluntary participation of many competent laboratories to carry out subjective tests.

The domain of media is very dynamic and, very often, MPEG cannot rely on established method – both subjective and objective – to assess the quality of compressed new media types. Therefore, MPEG is constantly innovating the methodologies it used to assess media quality.

Posts in this thread

How to make standards adopted by industry

Introduction

There are many definitions of standard. In the Webster’s you find a definition of standard as “Something that is established by authority, custom or general consent as a model or example to be followed”, an oldish definition that thinks that people must be directed to their good. In the Encyclopaedia Britannica you find “(A technical specification) that permits large production runs of component parts that are readily fitted to other parts without adjustment”, a definition driven by the idea that manufacturing is helped by the availability of different but compatible suppliers. Closer to my view of standard is another Webster’s definition “a conspicuous object (as a banner) formerly carried at the top of a pole and used to mark a rallying point especially in battle or to serve as an emblem” driven by the idea that everybody can develop a standard but its adoption depends on how satisfactory the proposed standard is to its intended users.

In many cases a standard is the result of the effort spent by a group of people who believe their interests are best served by agreeing to do certain things in an certain way. Agreeing on a standard may require a big effort (in MPEG developing a standard may cost tens of millions of USD to participating companies), but that is nothing compared to the effort required by convincing “other people” that the standard is what they need.

In this article I will present some of the efforts that MPEG has done over its 30+ years to convince “other people” that MPEG standards are what they need.

Convincing other people to adopt a standard is a process

If you think that convincing other industries that, when an MPEG standard, you need a good a marketing effort to get it adopted, you are missing the point. Anybody can put together a decent technical standard. Convincing other industries is a process that accompanies the development of the standard, starting from the moment the idea of a new standard takes shape.

In the early 1990’s all instances of the broadcasting industry – terrestrial, satellite and cable – were technically convinced that digital television was superior to analogue television. There were two problems, however. The first problem was that in some countries the industry espoused digital as an ally why in other countries the industry rejected it as a threat. The second problem was that there were solutions here and there and some attempts at developing standards, but solutions were proprietary and attempts at standards often fraught with rivalries. MPEG had achieved some notoriety with its first (MPEG-1) standard but had to acquire a new credibility vis-à-vis an industry that, already at that time, was worth more that 100 B$ p.a. and was understandably cautious with its decisions.

MPEG succeeded to convince the broadcasting industry, even the reluctant segments of it, namely the European terrestrial broadcasting industry. The deal was to offer its Requirements group as the place where the individual industry segments could express their needs and see them influence the technical developments. Unlike the approach of other bodies where often there is a coalition of interests blocking the requests from other groups based on the mantra “I cannot support this because my business is negatively affected”, MPEG took the opposite approach. All requests were discussed to understand whether they were new or could be folded in previous requests. The space of technical solutions was partitioned in profiles and levels to accommodate requests without negatively affecting others. Finally, when the MPEG-2 standards was completed, MPEG carried out Verification Tests and showed that 6 Mbit/s yielded “composite quality” of standard definition TV and 8 Mbit/s yielded “component quality”.

Credibility is not granted for ever

In the mid-1990’s MPEG had achieved the impossible. It had brought together all segments of the television industry, the package media industry included, and was addressing the studio needs that it satisfied with its 4:2:2 profile. MPEG, however, did not intend to be just the technical arm of the television industry (which, by the way, included audio as well). MPEG intended to fully execute the mission implied by its title “Coding of moving pictures and audio” which meant the “information representation layer” for whatever application domain.

In the mid-1990’s the role that internet would play in the media distribution media was not clear at all and so was the role that mobile networks would play. It was clear, however, that other delivery mechanisms would play a role. These mechanisms were characterised by “low bitrate”, “best effort” etc.

In hindsight trying to extend the hard-won role in the broadcasting industry to this unknown land was a very bold move. That field was antithetic to what MPEG had done so far, namely high quality and guaranteed (to some extent) delivery. Emblematic was the hot discussion around the MPEG-2 transport packetisation that was opposed by old style experts accustomed to rely on frame structure. More important was the fact that new industries, represented by the ICT (Information and Communication Technologies) acronym would play a major role.

MPEG made a big effort to adapt to the new environment. For instance it developed the software copyright disclaimer. The disclaimer eventually became a modified BSD – Berkeley Software Distribution licence, where the modification is contained in an explicit disclaimer that software copyright release does not imply release of patents. Another effort was to develop the file format which became the cornerstone on which the MPEG role in the ICT world was built.

A track record of collaborations

In 30+ years of standards development MPEG has established cooperation with many standards bodies and industry fora. In this chapter I will review some of the most outstanding and fruitful collaborations.

Broadcasting

MPEG has developed standards for broadcasting since its early days (DAB – Digital Audio Broadcasting was one application driving MPEG-1 Audio), Broadcasting continues to be a major customer to this dau. An indicative list of standards groups and industry fora MPEG interacts with is ABU – Asia-Pacific Broadcasting Union, ATSC – Advanced Television Systems Committee, Inc., DVB – Digital Video Broadcasting, EBU – European Broadcasting Union, ITU-R SG 6 – Broadcasting Service (terrestrial and satellite), ITU-T SG 9 – Television and sound transmission, SCTE – Society of Cable Telecommunications Engineers and DTG – Digital TV Group.

ATSC has adopted MPEG-2 Video, AVC and HEVC. In addition to these standards, DVB has also adopted MPEG-1 and MPEG-2 Audio and AAC. MPEG has referenced a DVB specification for its Media Orchestration standard (MPEG-B part 13). ITU-R and SCTE have adopted several MPEG standards.

Telecommunication

MPEG-1 was driven by the idea of interactive audio-visual services at a bitrate that telcos used to call as primary rate (1.5/2 Mbit/s) that were expected to be offered by ADSL – Asymmetric Digital Subscriber Line. Intense interaction with that industry began with MPEG-2 Video which is common text, which means that MPEG-2 Video is verbatim the same as H.222 and H.262. The tight collaboration of MPEG with ITU-T SG 16 – Multimedia services and systems continued with AVC and HEVC, and continues with VVC. The 3 standards are “aligned text” which means that the standards are technically equivalent but not editorially the same. Other related standard such as MPEG-C Part 7 – Supplemental enhancement information messages for coded video bitstreams, and MPEG-CICP Part 2 – Video and Part 4 – Usage of video signal type code points are also aligned text. MPEG is liaising with ITU-T SG 12 – Performance, QoS and QoE

MPEG has an ongoing intense collaboration with 3GPP – the Third Generation Partnership Project, an international organisation issuing standard for the mobile industry. 3GPP has adopted many MPEG standards such as AVC, HEVC, AAC, MP4 File Formay and DASH. MPEG is also liaising with ETSI – European Telecommunication Standards Institute.

Other media-related areas

The world of media is quite articulated and MPEG takes care of establishing contacts, developing standards for or using standards from different environments.

In the area of audio, MPEG has a long-standing liaison with AES – Audio Engineering Society and with SMPTE. MPEG is referencing several SMPTE standards, e.g. those related to HDR – High Dynamic Range.

AVS – Audio and Video Coding Standard Workgroup of China is a group developing audio-visual compression standards for the Chinese market. MPEG has a liaison with AVS.

Immersive media is the future but it is unclear what the future will exactly be. MPEG has developed ARAF – Augmented Reality Application Format and has developed ISO/IEC 21858 – Information model for mixed and augmented reality (MAR) contents jointly with SC 24 – Computer graphics, image processing and environmental data representation.

Fonts

Since the early 2000, MPEG has taken over the baton of the Open Type specification. Open Type was an open specification originally developed by Adobe, Apple and Microsoft. MPEG-OFF – Open Font Format is a standard that is universa;;y used wherever there are displays that are exprected to present fonts.

MPEG is liaising with SC 34 – Document Description and Processing Languages on the matter of fonts.

Information transport

When it developed MPEG-2 Video, MPEG had already the experience of the transport standard that it had developed for MPEG-1. MPEG-2 broadcasting applications could not rely on the assumption that the communication channel was error-free and MPEG had to develop a new standard that it called MPEG-2 Transport Stream (MPEG-2 Systems also defines another transport called MPEG-2 Program Stream aking to MPEG-1 Systems). MPEG-2 Systems is one of the most successful MPEG standards as it is used by broadcasting in all forms (ATSC, DVB, BDA and, before that DVD – Digital Versatile Disc etc.) and is used as a package in IPTV.

ATSC has adopted the full Audio-Video-Systems package offered by MPEG-H. The MPEG-H Systems layer is called MPEG Media Transport (MMT).

The Common Media Application Format (CMAF) is another successful transport standard.

transport standardMPEG has developed the MPEG-2 transport standard for sequences of JPEG 2000 and JPEG XS images.

Another successful media transport standard is MPEG-DASH. This has been adopted by 3GPP, ATSC, DVB and others.

Manufacturing industry

In addition to DAB, MPEG-1 was driven by the idea of a standard for audio-visual applications on CD – compact disc. This eventually was adopted by the industry under the name of Video CD. It was also driven by the idea of a new digital audio distribution format on CC – compact cassette. The CE – Consumer Electronic industry has tight contacts with MPEG via IEC TC 100 – Audio, Video and Multimedia Systems and Equipment. MPEG is also working with CTA – Consumer Technology Association. MPEG liaises with BDA – Blu-ray Disc Association and BDA has adopted several MPEG standards such as AVC and HEVC.

Genomics

MPEG has developed the 3 parts of MPEG-G – Genomic Information Representation jointly with WG 5 – Data processing and integration of TC 276 – Biotechnology. It is at the last stages of approval of Parts 4 and 5 and is developing, again jointly with TC 276/WG 5, Part 6 – Genomic Annotation Representation.

In addition to TC 276, MPEG is liaising with TC 215 – Health Informatics and with GA4GH –Global Alliance for Genomics and Health.

Internet of Things

Internet of Things per se is no business for MPEG because SC 41 – Internet of Things is in charge of standardisation in this area. MPEG liaises with SC 41.

MPEG has identified a specific instance of Internet of Things that it calls Internet of Media Things. This considers the specific but important case of a thing that is a camera or a microphone, a display or a loudspeaker, a unit capable of analysing the media content etc. Part 1 of MPEG-IoMT – Architecture is an instance of the general IoT Architecture developed by SC 41.

Artificial Intelligence

Artificial Intelligence per se is no business for MPEG because SC 42 – Artificial Intelligence is in charge of it. MPEG liaises with SC 41.

MPEG has used an AI technology – Neural Networks – for MPEG-7 Part 15 – Compact Descriptors for Video Analysis (CDVA), is working on MPEG-NNR, Part 17 of MPEG-7 – Compression of neural networks for multimedia content description and analysis. MPEG also plans to make intense use of neural networks for its future Video Coding for Machines standard.

MPEG is also investigating the connections between its Network-Based Media Processing (NBMP) standard, released as FDIS at the January 2020 meeting, and Big Media. Again, MPEG has no business in Big Media, an area of work for SC 42, but NBMP is likely to become an instance of the general Big Media Reference Model developed by SC 42.

Transportation

Data compression seems to have little to do with transportation, but this area of endeavour is more and more influenced by technologies mastered by MPEG. For instance, 3GPP is considering V2X (Vehicle-to-everything) communication, where information moves from a vehicle to any entity that may have a relationship with the vehicle, and vice versa. Specific forms of communication are: V2I (vehicle-to-infrastructure), V2N (vehicle-to-network), V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian), V2D (vehicle-to-device) and V2G (vehicle-to-grid).

Audio-visual information is clearly a major user of any such communication forms and MPEG standards are and will be more and more the main sources. Two examples are G-PCC – Geometry-based Point Cloud Compression and VCM – Video Coding for Machines.

MPEG is liaising with ISO TC 22 – Road Vehicles and TC 204 – Intelligent Transport Systems.

Conclusions

Standards lubricates our complex society and allo it to function and make progress. Developing standards is easy, but making sure that standards are adopted is difficult.

MPEG has been successful with the latter because it takes a holistic, end-to-ed approach to standardisation where its partners and customers – standards bodies and industry fora – are parts of standard development.

Posts in this thread

MPEG status report (Jan 2020)

Introduction

In the week of the 13^th of January, the Free University of Brussels has hosted the 129^th MPEG meeting . Two days (11-12) were dedicated to some 15 ad hoc group meetings and 6 days (7-12)to meetings of JVET, the joint MPEG-SG 16 group tasked to develop the VVC standard.

In this status report I will highlight some of the most relevant topics on which progress was made. The figure below captures the essence of the MPEG work plan as it resulted from the meeting.

Versatile Video Coding (VVC)

VVC (part 3 of MPEG-I) is being balloted and the ballot results are expected to be received at the July meeting so that MPEG can approve VVC as FDIS.

MPEG is now working on two related standards that are important for practical deployment: Carriage in MPEG-2 TS (Amendment 2 of MPEG-2 Systems) and Carriage in ISOBMFF (Amendment 2 of MPEG-4 part 15), both expected to be approved in January 2021.

Another activity around VVC is called Multi-Decoder Video Interface for Immersive Media (part 13 of MPEG-I). This aims to support the flexible use of media decoders, for example decoding only a subset of a single elementary stream. This feature is required for processing immersive media composed of a large number of elementary streams.

Essential Video Coding (EVC)

EVC (part 1 of MPEG-5) addresses the needs that have become apparent in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics. EVC is still under ballot and results are expected to become available at the April 2020 meeting (MPEG 130).

The group in charge of EVC has started considering Carriage of EVC in MPEG Systems.

Low Complexity Enhancement Video Coding (LCEVC)

LCEVC will provide a standardised video coding solution that leverages other video codecs in a manner that improves video compression efficiency while maintaining or lowering the overall encoding and decoding complexity. LCEVC will reach DIS in April 2020.

MPEG Immersive Video (MIV) and Video-based Point Cloud Compression (V-PCC)

Part 12 of MPEG-I Immersive Video shares with Part 5 of MPEG-I Video-based Point Cloud Coding (V-PCC) the notion of projecting a 3D scene to a series of planes, compressing the 2-D visual information on the planes with off-the-shelf video compression standards and providing a means to communicate how a 3D renderer can use the information contained in the atlases (in the case of MIV) and the patches (in the case of PCC). Outstanding convergence of the two approaches has been reached.

V-PCC will reach FDIS in April 2020 and MIV in January 2021. Both will have extensions, the latter to enable the ambitious, but needed 6 degrees of freedom (6DoF) where user can move in 6 directions.

The MPEG-4 File Format is being extended to include V-PCC and G-PCC data.

Video Coding for Machines (VCM)

VCM is an exploration on a new type of video coding designed to provide efficient representation of video information where the user is not human but a machine, with possible support of viewing by humans. Possible use cases for VCM are video surveillance, intelligent transportation, automatic driving and smart cities.

MPEG has produced a Draft Call for Evidence designed to acquire information on the feasibility of a Video Coding for Machines standard. For this purpose MPEG has published a Call for Test Data for Video Coding for Machines. Test data will be used to assess the responses to the Call for Evidence.

Neural Network-based Audio-Visual Compression

VVC and EVC will support the media industry by providing more compression for transmission and storage. They are both the current endpoints of a compression scheme that dates back to the mid-1980’s. Similarly MPEG-H 3D Audio is the current endpoint of the compression scheme initiated in 1997 with MPEG-2 AAC.

Today, as a result of the demonstration provided in recent years that neural networks can outperform other “traditional” algorithms in selected areas, many laboratories are carrying out significant research on the use of neural networks for coding of audio and visual signals as well as point clouds.

MPEG is calling its members to provide information on this new area of endeavour.

MPEG Immersive Audio

MPEG has produced a Draft CfP for Immersive Audio. The actual CfP will be issued in April 2020 and submissions are requested for July 2020. FDIS is planned for January 2022.

Neural Network Compression for Multimedia

Neyral Networks are used for muktimedia applications, such as speech understanding and image recognition. Industry, however, is coming to the conclusion that the IT infrastructure can very well not be able to cope with the growth of users and that in many cases intelligence is best distributed to the edge. As the size of some of these networks is hundreds of GBytes or even TBytes, compression of neural networks can support distribution of intelligemce to potentially millions of devices. See the figure below for a view of NBMP.

MPEG is progressing its work on the Compression of neural networks for multimedia content description and analysis standard. This is expected to reach CD status in January 2021.

Network-Based Media Processing (NBMP)

NBMP reached FDIS in January 2020. The standard defines a framework for content and service providers to describe, deploy, and control media processing. The framework includes an abstraction layer deployable on top of existing commercial cloud platforms and able to be integrated with 5G core and edge computing. The NBMP workflow manager enables composition of multiple media processing tasks to process incoming media and metadata from a media source and to produce processed media streams and metadata that are ready for distribution to media sinks.

MPEG is exploring how NBMP can become an instance of the Big Media reference model developed by SC 42 Artificial Intelligence.

Compression of Genomic Annotations

At the January 2020 meeting MPEG received 7 submissions in response to the joint Call for Proposals that MPEG and ISO TC 276/WG 5 on the efficient representation of annotations to sequencing data resulting from the analysis pipelines MPEG meeting in Brussels. MPEG has started working on a set of core experiments with the goal to integrate the proposed technologies into a single standard specification capable of satisfying all identified requirements and support rich varieties of queries.

FDIS is expected to be reached in January 2021.

MPEG and 5G

MPEG compression standards are mostly designed to represent information in an abstract way. However, the great success of MPEG standards is also due to the effort MPEG spent in providing the means to convey compressed information. 5G is being deployed and MPEG is investigating if and how its standards can be affected by 5G/

MPEG-21 Contracts to Smart Contracts

Blockchains offer an attractive way to execute electronic contracts. The problem is that there are many blockchains each with their own way of expressing the terms of a contract. MPEG considers that MPEG-21 can be the intermediate language in which smart contracts for different blockchains can be expressed.

One application can be for the following use case. There is no way to deduce from a smart contract the clauses that the smart contract contains. Publishing the human readable contract alleviates the concern, but does not ensure that the clauses of the human readable contract correspond to the clauses of the smart contract.

The figure below describes how the other party of the smart contract can know the clauses of the smart contract in a human readable form.

Posts in this thread

May 2026
M	T	W	T	F	S	S
« Mar
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31