Switching to ExoPlayer: Better Video on Android

How does one start implementing a working ExoPlayer solution? In this talk from 360|AnDev, Effie Barak walks us through Udemy’s transition from MediaPlayer to ExoPlayer, covering the basics on how to make the switch. She also touches on extending ExoPlayer and to implement advanced video features, such as background playing, variable speed changes, subtitles, and different play resolutions.


Introduction (0:00)

When Udemy first wrote its Android app, it used MediaPlayer. Six months ago, we decided to change to ExoPlayer.

ExoPlayer is an application level media player for Android, and it provides an alternate way for playing videos other than using MediaPlayer. It is an open source third party library made by Google, written in Java, and it relies on low level media encoding APIs.

The Udemy app is used to view courses and video lectures, so the media part part is one of the core functionalities of our app. It was important for us to use something reliable and stable. We also needed something customizable, to be able to add functionality beyond the basic play, pause, and next functions.

MediaPlayer vs. ExoPlayer (1:20)

mediaPlayer.setDataSource(url);
mediaPlayer.prepare();
mediaPlayer.start();

MediaPlayer has advantages. The main one is that it is easy to get started with. You only need these three lines of code to play most files.

The disadvantage is that it is not customizable and not extensible. Over time, our needs at Udemy grew beyond play this video to add more features and support for HLS or variable playback speeds. We could not do any of that with MediaPlayer out of the box.

MediaPlayer has other disadvantages: it is a black box, written natively, impossible to debug or figure out exactly what exceptions are happening. MediaPlayer is also installed locally on your phone, therefore there could be differences in implementation and the way it works on different OS’s and phones. We cannot control or guarantee exactly which version will be used. Many of the states can be buggy, and it is hard to debug them. MediaPlayer sends out weird exceptions codes, and we could not see for certain what crashes are happening.

ExoPlayer solves all the aforementioned problems. It is very extensible, but it has a steeper implementation curve. It is open source, written in Java, easy to debug, and easy to read. It is built on top of MediaCodec, which is more low level, and it handles HLS correctly. It supports background playing, subtitles, variable playback speed, and variable resolutions.

ExoPlayer Basics (3:55)

ExoPlayer is written in Java, so anything you may not know exactly how it works/what a variable means, it is easy to dive in to the code and see exactly what is happening and where or why it is using it.

The sample app provided by Google is a good place to start. It has the basic architecture that you can easily adopt and most of the implementation there is good and applicable.

The default implementations for classes are good implementations, even though we can technically extend or write our own things. You can use the defaults and everything will work fine.

Get more development news like this

ExoPlayer implementation requires a little more programatically to get it started and play videos. The work can be roughly separated into two parts: the player that is controlled through the UI to perform the big actions (i.e. play, stop, next), and the core parts that fetch the stream, decode it, and process it for playing.

The Player (5:16)

player = ExoPlayer.Factory.newInstance(
  PlayerConstants.RENDERER_COUNT,
  MIN_BUFFER_MS,
  MIN_REBUFFER_MS);

playerControl = new PlayerControl(player);

This is an example of how a player instance is instantiated. The playerControl is the default component that is used to communicate with the player.

In order to get information from other parts of the engine to the player and for it to respond accordingly (for example by adding retry cases in case of failure), the player can listen to a bunch of events from the core engine components:

public abstract class UdemyBaseExoplayer
  implements ExoPlayer.Listener,
    ChunkSampleSource.EventListener,
    HlsSampleSource.EventListener,
    DefaultBandwidthMeter.EventListener,
    MediaCodecVideoTrackRenderer.EventListener,
    MediaCodecAudioTrackRenderer.EventListener
player.addListener(this);

The Core (6:16)

The core is a bit different if we are dealing with playing non-adaptive stream types (such as MP3s or MP4s) and it gets more complex with HLS and Dash.

“The journey of a stream” comes from the server by the default URI data source, it is extracted by the ExtractorSampleSource that figures out the correct extractor to use (in the case of MP4, it is Mp4Extractor). It opens the file and decodes the audio and video into raw audio and video that the renders can feed the player, the player will play them.

However, it can also be the other way around: the player starts asking for the buffers from the renders, and then they go back and say, “maybe we don’t have anything, ExtractorSampleSource, give me something”, and ExtractorSampleSource might say “I need to fetch something, I will use the default URI data source to do it”.

This looks like much code at first glance, do we need all of that to get going?

Allocator allocator = new DefaultAllocator(BUFFER_SEGMENT_SIZE);
Handler mainHandler = player.getMainHandler();

DefaultBandwidthMeter bandwidthMeter = new DefaultBandwidthMeter(mainHandler, null);
DataSource dataSource = new DefaultUriDataSource(
  context, bandwidthMeter, Util.getUserAgent(mContext, Constants.UDEMY_NAME));
ExtractorSampleSource sampleSource = new ExtractorSampleSource(uri, dataSource, allocator,
  BUFFER_SEGMENT_COUNT * BUFFER_SEGMENT_SIZE, mainHandler, player, 0);
MediaCodecVideoTrackRenderer videoRenderer = new MediaCodecVideoTrackRenderer(context,
  sampleSource, MediaCodecSelector.DEFAULT, MediaCodec.VIDEO_SCALING_MODE_SCALE_TO_FIT, 5000,
  mainHandler, player, 50);
MediaCodecAudioTrackRenderer audioRenderer = new MediaCodecAudioTrackRenderer(sampleSource,
  MediaCodecSelector.DEFAULT, null, true, mainHandler, player,
  AudioCapabilities.getCapabilities(context), AudioManager.STREAM_MUSIC);

TrackRenderer[] renderers = new TrackRenderer[PlayerConstants.RENDERER_COUNT];
renderers[PlayerConstants.TYPE_VIDEO] = videoRenderer;
renderers[PlayerConstants.TYPE_AUDIO] = audioRenderer;
player.onRenderers(renderers, bandwidthMeter);

The answer is yes, but don’t think about it too hard because most of it is default. I copied this directly from the demo app, and this will work. There is no need to spend too much time on figuring out exactly what each and every part here does.

// Allocator allocator = new DefaultAllocator(BUFFER_SEGMENT_SIZE);
// Handler mainHandler = player.getMainHandler();
// DefaultBandwidthMeter bandwidthMeter = new DefaultBandwidthMeter(mainHandler, null);
// DataSource dataSource = new DefaultUriDataSource(context, bandwidthMeter,
//                         Util.getUserAgent(mContext, Constants.UDEMY_NAME));

ExtractorSampleSource sampleSource = new ExtractorSampleSource(uri, dataSource, allocator,
  BUFFER_SEGMENT_COUNT * BUFFER_SEGMENT_SIZE, mainHandler, player, 0);

MediaCodecVideoTrackRenderer videoRenderer = new MediaCodecVideoTrackRenderer(context,
  sampleSource, MediaCodecSelector.DEFAULT, MediaCodec.VIDEO_SCALING_MODE_SCALE_TO_FIT, 5000,
  mainHandler, player, 50);

MediaCodecAudioTrackRenderer audioRenderer = new MediaCodecAudioTrackRenderer(sampleSource,
  MediaCodecSelector.DEFAULT, null, true, mainHandler, player,
  AudioCapabilities.getCapabilities(context), AudioManager.STREAM_MUSIC);

TrackRenderer[] renderers = new TrackRenderer[PlayerConstants.RENDERER_COUNT];
renderers[PlayerConstants.TYPE_VIDEO] = videoRenderer;
renderers[PlayerConstants.TYPE_AUDIO] = audioRenderer;
player.onRenderers(renderers, bandwidthMeter);

The DefaultUriDataSource is given to the ExtractorSampleSource. We put them all in this array and then we give them to the player to use.

Build Extractors

DefaultUriDataSource uriDataSource = new DefaultUriDataSource
    (context, bandwidthMeter, userAgent);
ExtractorSampleSource sampleSource = new ExtractorSampleSource
    (uri, uriDataSource, allocator,
    PlayerConstants.BUFFER_SEGMENT_COUNT * PlayerConstants.BUFFER_SEGMENT_SIZE);

Build Renderers

TrackRenderer[] renderers =
    new TrackRenderer[PlayerConstants.RENDERER_COUNT];
    
MediaCodecAudioTrackRenderer audioRenderer =
  new MediaCodecAudioTrackRenderer(
    sampleSource, MediaCodecSelector.DEFAULT,
      null, true, player.getMainHandler(), player,
    AudioCapabilities.getCapabilities(context),
    AudioManager.STREAM_MUSIC);
    
renderers[PlayerConstants.TYPE_AUDIO] = audioRenderer;

Connect Renderers to the Player

player.prepare(renderers);

Udemy customizations to basic structure (8:53)

In Udemy we made some customizations to the basic structure. For example, we decreased the buffer size and the segment size after getting some out of memory exceptions on low end devices.

public static final int BUFFER_SEGMENT_SIZE = 16 * 1024; // Original value was 64 * 1024
public static final int VIDEO_BUFFER_SEGMENTS = 50; // Original value was 200
public static final int AUDIO_BUFFER_SEGMENTS = 20; // Original value was 54
public static final int BUFFER_SEGMENT_COUNT = 64; // Original value was 256

Because the ExtractorSampleSource goes over and selects the correct extractor to use, we already knew what type of file and what type of media we were playing. We could thus give it only the types of extractor that we wanted to use and avoid the part where it goes over all the extractors and try and figure out exactly what type of stream this is. That will play an MP4.

mp4Extractor = new Mp4Extractor();
mp3Extractor = new Mp3Extractor();
sampleSource = new ExtractorSampleSource(..., mp4Extractor, mp3Extractor);

HLS (9:42)

For HLS, things get a little more complicated. HLS is a stream that is broken into a sequence of small file downloads. Each download is a short chunk of the overall stream and as the stream is played the client may select from a number of different alternate streams that are encoded at different data rates, allowing the stream session to switch between them during playing based on logic such as bandwidth.

At the start of a streaming session, the client downloads an extended m3u8 playlist, which is a text file that contains metadata for the various substreams that are available. HLS uses a new source to decide which chunk to use based on available bandwidth and switches between those chunks successfully.

The rest of the stream is similar, and all the decision making about which stream quality to choose and switching between streams is handled by HlsChunkSource and HlsSampleSource. We do not have to write custom logic for that unless we are unhappy with the defaults.

DefaultBandwidthMeter bandwidthMeter = new DefaultBandwidthMeter();
PtsTimestampAdjusterProvider timestampAdjusterProvider =
  new PtsTimestampAdjusterProvider();
HlsChunkSource chunkSource = new HlsChunkSource(...,
  uriDataSource,
  url,...,
  bandwidthMeter,
  timestampAdjusterProvider,
  HlsChunkSource.ADAPTIVE_MODE_SPLICE);

This is the instantiation of the new HlsChunkSource. It uses a bandwidthMeter to calculate the current available bandwidth and a PtsTimestamp to add the PTS timestamps contained in a chunk with a given discontinuity sequence. (If you don’t know what that means, that’s fine; I am not sure I do either.)

You can also choose how you want adaptive mode to be handled, which is the last parameter given to HlsChunkSource class. You can choose between none, splice, and abrupt.

none means that whatever was selected when the video was started is going to stay the same throughout the play.

splice means that while switching between chunks, it will download the old one and the new one, in case the previous one does not end exactly where the new one starts. That helps, because the user otherwise might see a glitch in the video when switching between chunks.

abrupt means that it will download the new chunk but not the old one. If the time stamps do not match, then the user will see a glitch when switching.

sampleSource =
    new HlsSampleSource(chunkSource, ...);

This HlsSampleSource takes that new chunkSource that we created. This is an example from the debugger about how HlsChunkSource has taken our metafile and composed a list of possible streams to choose from and how it holds that information.

At Udemy, we needed to overwrite this - we wanted to sometimes give the user the option to say “I want to use this quality and nothing else,” so we had to customize HLS to not change qualities dynamically. ADAPTIVE_MODE_NONE means that whatever quality was selected will not change throughout playback, and that is a good part of what we want.

HlsChunkSource chunkSource =
  new HlsChunkSource(..., HlsChunkSource.ADAPTIVE_MODE_NONE);

We also need to be able to tell it which quality to choose to begin with. It starts with a default value, but what about manually changing it? That unfortunately involves manually setting this private int selectedVariantIndex, a private field that is used throughout this class, and particularly in getChunkOperation, which performs the bulk of the work in this class. That means that we needed to create our own instance of this class to add this functionality, and that was sad.

// The index in variants of the currently selected variant.
private int selectedVariantIndex;

public void getChunkOperation(...) {
  int nextVariantIndex;
  ...
  if (adaptiveMode == ADAPTIVE_MODE_NONE) {
    nextVariantIndex = selectedVariantIndex;
    switchingVariantSpliced = false;
  } else {
... }
...

Background Media Playing (14:48)

The Udemy app also supports the background media service. We want our app to continue playing in the background in audio mode only while displaying a tile notification that allows the user to control the media. This is a functionality that we already had when we were using MediaPlayer, but this was harder to do because we instantiated an underinstance of MediaPlayer and then had to communicate information back and forth from the activity to the service all the time.

player.blockingSendMessage(
  videoRenderer,
  MediaCodecVideoTrackRenderer.MSG_SET_SURFACE,
  null);

ExoPlayer has audio in background capability built in that makes things way simpler. The first thing to do when switching to background audio is to clear the surface of the player so it will not be attached to a view. That means the audio will keep playing in the background.

This line of code sends a message to the player that says set the surface to an alt. If you try to run the sample ExoPlayer app, you will see that the background audio continues for a few seconds and then dies off. That is because the app’s process is being killed off. Creating a service prevents that from happening.

Also in the service, we can create the notification tile and control the instance of the player.

<service android:name="com.udemy.android.player.exoplayer.UdemyExoplayerService"
    android:exported="true" android:label="@string/app_name" android:enabled="true">
    <intent-filter>
      <action android:name="ccom.udemy.android.player.exoplayer.UdemyExoplayerService">
      </action>
    </intent-filter>
</service>

With both the service and the activity using the same instance of the player, there is no longer a need to sync between them as there was in the old media player. When the activity resumes, we can reconnect the surface from the view to the player and continue playing seamlessly.

setPlayerSurface(surfaceView.getHolder().getSurface());
And in the player
public void setSurface(Surface surface) {
  this.surface = surface;
  pushSurface(false);
}

Subtitles (16:58)

The Udemy app also supports subtitles, and had some issues with ExoPlayer:

  1. It does not support .srt files (which are the majority of our subtitles).
  2. If you want to instantiate subtitles, you would instantiate another renderer, which means that the video will completely crash in case of error. We wanted the video to keep playing and just not play the subtitles if they were faulty.
  3. We wanted to support multiple formats, i.e. UTF-8. We have subtitles in many different languages, so we needed that functionality working. Our solution was to feed the subtitles manually.

For pacing of the file and figuring out the time stamps, we used this subtitle conversion library.

public void displayExoplayerSubtitles(
    File file,
    final  MediaController.MediaPlayerControl playerControl,
    final ViewGroup subtitleLayout,
    final Context context) {
  convertFileCaptionList(file, context);
  runnableCode = new Runnable() {
    @Override
    public void run() {
      displayForPosition(playerControl.getCurrentPosition(), subtitleLayout, context);
      handler.postDelayed(runnableCode, 200);
    }
};
  handler.post(runnableCode);
}

Variable Playback Speeds (18:27)

Variable playback speeds (the ability to play a video at a slower or faster pace) was one of the top feature requests, and something we could not do with MediaPlayer. It was one of the first things we added to ExoPlayer.

In ExoPlayer, video always follows audio. There is always logic to keep them in sync, and it is always about audio first. Our idea was: if you make the audio run faster, the video will follow along and will also play faster. There is a library, Sonic, that takes an audio buffer, makes it faster or slower, and returns a new buffer.

The component that we need to extend is the audio renderer.

public class VariableSpeedAudioRenderer
    extends MediaCodecAudioTrackRenderer


// Method to override

private byte[] sonicInputBuffer;
private byte[] sonicOutputBuffer;

@Override
protected void onOutputFormatChanged(final MediaFormat format) {

We wanted to tell the audio renderer that instead of using whatever buffer it has, we will take that buffer, pass it to Sonic, get the buffer back from Sonic and use it. We extended the MediaCodecAudioTrackRenderer, then we overrode two methods.

The first method was onOutputFormatChanged, which happens most of the times when a track is first instantiated. In this method what we are doing is we are setting up all the Sonic related buffers and the Sonic class itself. The last part is flushing the stream in case (before playing in a different speed), and setting the speed to whatever the user has selected.

// Two samples per frame * 2 to support audio speeds down to 0.5
final int bufferSizeBytes = SAMPLES_PER_CODEC_FRAME * 2 * 2 * channelCount;

this.sonicInputBuffer = new byte[bufferSizeBytes];
this.sonicOutputBuffer = new byte[bufferSizeBytes];

this.sonic = new Sonic(
  format.getInteger(MediaFormat.KEY_SAMPLE_RATE),
  format.getInteger(MediaFormat.KEY_CHANNEL_COUNT));

this.lastInternalBuffer = ByteBuffer.wrap(sonicOutputBuffer, 0, 0);

sonic.flushStream();
sonic.setSpeed(audioSpeed);

The second method we overrode was the processOutputBuffer which is the actual part that takes the buffer and plays it. In it, we get the buffer, write the buffer to Sonic, and then read back from it the buffer that is now modified with our speed. We give that to the superclass to continue using.

@Override
protected boolean processOutputBuffer(..., final ByteBuffer buffer,...)
private ByteBuffer lastInternalBuffer;

buffer.get(sonicInputBuffer, 0, bytesToRead);
sonic.writeBytesToStream(sonicInputBuffer, bytesToRead);
sonic.readBytesFromStream(sonicOutputBuffer, sonicOutputBuffer.length);

return super.processOutputBuffer(..., lastInternalBuffer, ...);

Some Optimization (21:28)

Sometimes, we would read or write more to Sonic than we read in this particular instance, so we did not always have to write more into the Sonic buffer. Sometimes we could use what was already processed and feed it back.

if (bufferIndex == lastSeenBufferIndex) {
  return super.processOutputBuffer(..., lastInternalBuffer, ..., bufferIndex, ...);
} else {
  lastSeenBufferIndex = bufferIndex;
}

If we want to change the speed in the middle of consuming a buffer, we would have to flush the stream and set a new speed, and the next time it requests a buffer, it will go, “now I need to write and read from a new buffer”.

if (wasSpeedChanged) {
  sonic.flushStream();
  sonic.setSpeed(audioSpeed);
}

Effie Barak

Effie Barak works at Udemy as a Senior Android developer on the mobile team, responsible for Android app development and leading the development of the Android TV prototype. Effie began working as a C# developer 10 years ago. She became a mobile developer 4 years ago, and released many Windows and Windows Phone apps. After moving to San Francisco in 2013, Effie worked for Slack on the Windows Phone app.

Transcribed by Sandra Sanchez-Roige
Edited by Billy Leet