0 0 0 0

The last article briefly explains what technology is needed in h5, which live streaming protocols and technologies are required to do the broadcast. By contrast, this article focuses on the contents of the protocol, which means that this will directly perform practice buffer exercises and related.


The full name is Real-Time Messaging Protocol. As the name suggests, it's a protocol for real-time communication. The protocol was made by adobe. It's mainly used to transmit audio and video streams. It completes the playback and related operations of the specified broadcast stream through a custom protocol. Compared with the current live streaming, the main feature of the is high efficiency, which I don 't have. Let's start by looking at how the ribbon is handshake.

Rtmp handshake

The is based on the tcp three handshake, so the key isn't a tcp level. It itself is based on tcp reliability. The mode of handshake is as follows:

( c represents the client, and s represents the server )

It's mainly through the field content negotiation to complete the reliability authentication process. The basic process is as follows:

  • Client: clients need to send 3.

  • Server: also need to send the same 3. S1, s2.

The entire process is described as described above, but some details need to be noted.

Handshake starts:

【 1 】 client sends. Mmc package.

At this point, the client is waiting. There are two limits to the client:

  • The package cannot be sent until the s1 isn't accepted.

  • The client cannot send any actual packets before the s2 is received.

The 【 2 】 service is received by the, the sending, and the s1 package. You can also wait until you receive. And the wait isn't required.

At this point, the service server is waiting. There are two limits to service pack:

  • The service server cannot send s2 until it isn't accepted.

  • The server cannot send any actual packets before it isn't received

The 【 3 】 client accepts the s1/, package and sends the mmc package.

The 【 4 】 service is accepted by the server, and the s2 package is returned and the handshake is complete.

However, in practical applications, it isn't strictly followed by above. Because isn't a strong security protocol, the s2/c2 requires the contents of c1/s1, which can complete the content stitching.

So many restrictions, in fact, is a general pattern:

  • C0 + s1

  • S0 + s1 + s2

  • C2

Next, let 's take a look at what the c/s 012 package represents respectively.

C0 &&

and, are very different, and I'll explain it here. First, the length of, is 1B. Its main job is to determine the version number of the.

  • the client sends the version number that it supports: 3 ~ 31. Generally written in 3.

  • S1: the server returns the version number it supports. If there's no ve & ion number of the client, 3 is returned by default.

&& s1

/s1 length is 1536B. Primary purpose is to ensure the uniqueness of the handshake. Format is:

  • Time: send timestamp, this isn't really important, but remember, don't exceed 4B range to do.

  • Zero: reserved value 0.

  • Random: the long tail 1528B of the field. The main content is random value, whatever you use to produce. It's mainly to ensure the uniqueness of the handshake and identify the handshake.

Mmc && s2

The length of pi/s2 is also 1536B. It's equivalent to the response of s1/c1. The above illustration shows that it's the same as the copy value of c1/s1, but the second field is different. The basic format is:

  • Time: time stamp, same as that, not very important.

  • timestamp sent by mmc/s1.

  • Random: random number sent from s1/s1. Length 1528B.

Here's what you need to mention, big-endian is written and read by using, unless you emphasize the use of little-endian byte order for a field.

The order of the handshake protocols is also based on the fields related to them. It seems easy to. But we aren't just trying to figure out what to do, but to really understand, next, if we do a, handshake. Here, we're launching the request as a client, assuming that the server side is sent by standard.

Buffer handshake

We use buffer primarily for two, one block is the build of request server, and a piece of buffer.

Request server setup

The server here's directly using the underlying tcp connection.

As follows, a simple template:

const client = new net.Socket();
 port: 1935,

However, in order to do the actual walkthrough better, we do a filter through the EventEmitter approach. Here, we use the Mitt module to do the proxy.

const Emitter = require('mitt')();

Then, we just analyze the S0/1/2 package that will be accepted. According to the bytes above, you know the details of the package clearly. Here, for simplicity, we exclude the packet from other protocols, just for the packet inside. Also, we're targeting only 3, S0/1/2. In order to achieve this, we need to add the hook to the data time.

Here, we use the now live, to carry out the broadcast.

Buffer operation

The server is built online search, should all search out. The key is how to,/, for the handshake. So, here, we'll focus on the above operations primarily.

Our main work is how to construct C0/1/2. According to the format described above, you should be able to understand the format of the C0/1/2.

For example, time and random in s1 isn't actually a field, so for simplicity, we can set up to 0 for simplicity. The specific code is as follows:

class C {
 constructor() {
 C0() {
 let buf = Buffer.alloc(1);
 buf[0] = 3;
 return buf;
 C1() {
 let buf = Buffer.alloc(1536);
 return buf;
 * write C2 package
 * @param {Number} time the 4B Number of time
 * @param {Buffer} random 1528 byte
 let buf = Buffer.alloc(1536);
//leave empty value as origin time
 buf.writeUInt32BE(this.time, 4);
 return buf;
 get getC01(){
 return Buffer.concat([this.C0(),this.C1()]);
 get C2(){
 return this.produceC2();

Next, let 's take a look at the client service.

const Client = new net.Socket();
const RTMP_C = new C();
 port: 1935,
}, () => {
 console.warn('received empty Buffer ' + res);
//start to decode res package
 if(!RTMP_C.S0 && res.length>0){
 RTMP_C.S0 = res.readUInt8(0);
 res = res.slice(1);
 if(!RTMP_C.S1 && res.length>=1536){
 RTMP_C.time = res.readUInt32BE(0);
 RTMP_C.random = res.slice(8,1536);
 RTMP_C.S1 = true;
 res = res.slice(1536);
 console.log('send C2');
 if(!RTMP_C.S2 && res.length> = 1536){
 RTMP_C.S2 = true;
 res = res.slice(1536);

Detailed code can refer to the .

Basic architecture of

The entire content, except the handshake, is actually the rest of the columns around the type id. In order to see the entire architecture more clearly, a frame is shown here:

3 levels of under message are what we're about to explain now.

You can see that all the items above have a common parent item--message. Its basic structure is:

  • Header: the header section is used to identify different. Telling the client corresponding message types. In addition, there's a, distribution.

  • Body: body content is the data sent accordingly. The format is completely different according to the different.

Next, let 's take a look at the contents of the header and different,:


The header is divided into basic header and message header. It's important to note that they aren't independent but. The structure of the message header is determined by the contents of the basic header.

Next, separate the lines to explain:

Basic header

Bh ( base head ) is primarily defined by the chunk id and chunk type. It's important to note that Honeywell is variable length, that's, its length range is 1 -3b. How do you say? it's based on a different chunk of id to determine the specific length. Id the id (, ) itself supports the range of <= 65597, approximately 22bit. Of course, to save this 3B content. Adobe has made a relatively, theory that's determined by the cs id in the following format:

 0 1 2 3 4 5 6 7
 |fmt| cs id |

That's, the length of the entire is determined by 2 -7 bit bits. How are you sure.

Rtmp, 0, 1, 2 for cs id, you can only start with 3 when you set cs id.

  • Cs id: 0 = => entire 2B, which can represent the number of stream id in 64 -319. For example:

 0 1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
 |fmt| 0 | cs id - 64 |

Note the above id 64. The representation is that you'll add a value of 64 by cutting the second byte. That's, 2th byte + 64 = CS ID

  • Cs id: 1 = => entire 3B. The number of streams that can be stored id is 64 -65599. For example:

 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
 |fmt| 1 | cs id - 64 |

Of course, the calculation of, id is also the final result plus 64.

  • Cs id> 2 = => entire 1B. The number of streams that can be stored id is 3 -63. For example:

 0 1 2 3 4 5 6 7
 |fmt| cs id |

Finally, as a result, id 0,1, 2 is a reserved word, so 0,1, 2 isn't as a cs id. In summary, the starting, 3 of the cs id ( doesn't represent it as a 3 stream ).

I didn't mention the linkid field above, which is actually used to define the message header.


Based on the definition of the field in the previous 'am, you can divide into six, ( message headers ). Or, a, format will exist from:

: 0

The length of the pyridine is 11B when it's 0. The of this type must be the beginning of the, which includes the stream that's retrieved when the or on demand is. The overall format of this structure is as follows:

That's, when the linkid is 0, the is a complete.

  • Is an absolute timestamp. To represent the current stream encoding.

  • Message length: 3 b, sending the length of message.

  • Type id: 1 b

  • Stream id: 4 b, sending the value of the message stream id. It's little-endian writing format.

: 1

The length of the pyridine is 7B when it's 1. The type of this type isn't a msg stream id. Msg stream id determined by previous package. The value is mainly determined by the of a. This type of pyridine is usually placed after the.

: 2

The length of the pyridine is 3B when it's 2. The type of includes only one field. Other information is determined by the previous type of.

: 3

When, is 3, this is actually not in the. The official definition is that this type is all the packet of the primary key and its header information and the first, header. Because this is mainly used in. These pieces are cut from the bag, so other pieces can be used in addition to the first. When fmt is 3, it takes a few minutes to compute its timestamp. If timestrameDelta is present in the previous chunk, then it's added to the chunk that evaluates fmt to 3, and if it isn't, it's added with the timestamp of the previous chunk, and the code is expressed as:

prevChunk.timeStamp += prevChunk.timeStampDelta || prevChunk.timeStamp;

However, in the case of,: 3, it's often difficult to encounter. Because he requires a few packages to exist in the previous few packages to be 0/1/2.

The next is the message body part.


The main part of the message header is the common part of the message header, but for a specific, message, the type inside it has different format for. The message is shown in the preceding illustration:

Here, we're going to explain the the of the flow chart.


is all known as: message ( protocol control message ). Primary use to communicate connection information for the initial initial state, such as windows size, size, and so on.

There are 5 different types of message types in id, depending on the type in the header, which is 1 ~ 6 ( excluding 4 ). In addition, the heaer needs to be noted that the message stream id and chunk id in its need to be set to a fixed value:

  • Message stream id is 0

  • Id is 2

As shown:

Let's next one to introduce:

Set bundle size ( 1 )

Look at the names. You should guess that's what this kind of information is for. Pcm is the size of the chunk used to set up the official transport information between server and client, and type id is 1. So what's this.

Scs ( set chunk size ) is the sending limit for data sizes for formal data delivery. The default is 128B. However, if the server feels too small, you want to send a larger, such as 132B, then the server needs to send you a.

  • 0: can only be set to 0 to indicate the type of current windowsmobile.

  • Chunk size: the size used to indicate the official data being sent. Range 1 -16777215.

As follows, the result of a, password is provided:

Shelveset message ( 2 )

The type of, is used to tell the client that the specified stream has been dropped and has been loaded into half or not loaded. It needs to specify a stream.

The basic format is:

  • stream: specifies the stream to discard the packet message

Acknowledgement ( 3 )

The protocol information is actually an endnote packet that isn't used in actual use, which is primarily used as an endnote packet to represent the maximum number of.

Its basic format is:

  • Sequence number [ 4 _ ]: size 4

However, in the actual application, there's no high frequency.

Window acknowledgement size ( 5 )

This is used to negotiate the size of the send package. This is different from the chunk size above, which is mainly aimed at the maximum packet size that the client can accept, and the size of the packet is the size of. Also known as window size. The size of the general computer settings is 500000.

The detailed format is:

The result of the, is:

Set peer ( 6 )

This is the the last bag. His job is to change the size of the packet based on speed. Its format is similar to was, but it's followed by a Type to indicate the current bandwidth limit algorithm. When the party receives the information, if the window size is set to the previous was, it needs to be returned to display the change.

The basic format is:

The limit type has 3 values:

  • 0: hard to match the current bandwidth to match the currently set window size

  • 1: soft, sets the current broadband to the window size defined for this information, or the window size that's already in effect. Depends on whose window size is smaller.

  • 2: dynamic, if the previous limit type is hard then continue using Hard, otherwise the protocol information is ignored.

Actual capture case can be referenced:


All known as: ( user control information ). Its type id can only be 4. It mainly sends some control information about the video. The conditions that are sent are also limited:

  • Msg stream id is 0

  • Id is 2

The basic format of its body part is:

based on event type, convection is different settings. PingRequest(6)StreamIs Recorded(4)SetBuffer Length(3)StreamDry(2)Stream EOF(1)Stream Begin(0) its event type has a total of 6. PingResponse(7), and so on.

Here, according to the importance division, only begin, eof, SetBuffer length 3.

  • Stream begin: event type 0. It often appears when the client and server are successful connect sent. Event data is 4B, content is a stream of streams that can be officially used to transfer data ( actually nothing ).

  • Stream deserializing: event type 1. It often appears when the video stream is all over. Event data is 4B to indicate that the stream id has been sent to the audio stream ( actually nothing ).

  • Set buffer length: event type 3. It's primarily used to notify the server of the size of the buffer in each milliseconds. The fi & t 4B of event data represents the stream id, and the following 4B represents the size of each milliseconds buffer. Usually 3000 ms

Ok is left with the contents of the command msg.

Command msg

The contents of the command msg, whose type id covers the values between 8 ~ 22. The following details are available in the following table:

It's important to note why there are two options in the options, which are related to the version selection. The first id represents the AMF0 of the and the second id represents the encoding mode of the AMF3.
It's important to be command msg, video, audio, 3 msg. To get a better understanding of the stream, here are two, of video and audio.

Video reagents

Because the is adobe. For example, the internal use format must be. But it isn't the same. Because there are many tags and associated description information within the format. So, how is the resolved. It's a direct, file, and a custom protocol to segment the tags.

This is a great answer because the protocol is a long connection, and if it's a, file, it isn't necessary to use it at all, and the Live video is. In summary, the is based on its own custom protocol to segment the tag. What's the specific protocol.

This isn't actually given in the official document. It just tells us the type of id is 9.

Because the source is only a transport tool, what's passed in or by a specific stream generation framework? So, here, I've chosen a very representative, broadcast to explain.

You can capture the following packet packet data by using the password:

Here's a point to mention because the video source is active to separate video and audio, so it needs to cross the video and audio to ensure video synchronization. So, what's the same data in every video.

If you look at the tag, they're transmitting videodata tags. Read the contents of the videodata tag.

This is a protocol format for video. But when you encounter the first field FrameType, we may be surprised to see that there are 5 cases, whether the will give you eight different packages?

The answer is that it's possible, but, in large cases, we only need to support 1/2. Because the most important thing in the video is the I frame, which corresponds to FrameType. And b/p is the remaining 2. We can achieve all the information of the video as long as we do for 1/2.

So, in the, the main ( or most ) is the transmission of the above two FrameType. We'll explain it through the actual capture.

This is a packet of. Notice the 17 number at the beginning of buffer. You can find the above FrameType corresponding to, see if the results are consistent:

This is the inter-frame package. Same as you can, you can also compare:

Audio tag

Aduio tag is also the same data as video tags. Observe the contents of flv audio tag:

These fields are all related configuration values, in other words, you must implement the. Here, the tape is a bit different from the audio tag and the video tag. Because the audio tag is no longer possible to divide into the config tag, the will pass the above audio tag content directly. You can refer to the content in detail.

This is also the content of all audio reagents.

Because audio and video are sent separately. So, in the late stage, we need to pay attention to the synchronization of the two. Say here, by the way, the video synchronization is related.

Audio and video synchronization

There are three types of video synchronization:

  • Audio synchronization, audio synchronization

  • Video synchronization video synchronization

  • As an external time stamp, the porn is synchronized at the same time.

The main process variable reference is timestamp and duration. Because here's mainly to do the broadcast, recommended for the second method, to Video. Because in the actual development, the file generation, must ask the first frame to be keyframe, this caused the first, which caused the two variables. One is, one is. Of course, the solution is also to check whether the last patch is. And then decide whether to move to the next synchronization.

Here, I simply say. Synchronization method. Video synchronization, without the first frame of the tube, doesn't need to be concerned with the data in the audio, because the audio data is very simple. Here's a description of the following:

//known condition
video.timeStamp && video.perDuration && video.wholeDuration
audio.timeStamp && audio.perDuration
refDuration = video.timeStamp + video.wholeDuration
delta = refDuration - audio.timeStamp
audioCount = Math.round(delta/audio.perDuration);
audDemuxArr = this._tmpArr.splice(0,audioCount);
//begin to demux

The above algorithm can avoid the comparison between aduio and video, and ensure that video is always in front of audio. Next, we return to the content. To see what's inside the command glycol.

Command msg

Command msg is one of the main information delivery tools in the. Often used in and processes. The command msg is transmitted in the form of ( in fact, binary encoding rules similar to json ). The command msg is mainly divided into net connect and net stream. Its communication mode is two-way, that's, after you send a net connect or stream, the other end must return a _result or _error to indicate receiving information. The detailed structure can refer to the following image:

After that, we're divided into two,

  • Netconnection

  • Disk disk

The _result and _error will be explained in each package.


A netconnection can be divided into 4 msg, connect, call, createStream, close.


Connect is the client to send playback requests to the server side. The contents of the field are:

  • ] [ command name string default = connect. Show information name

  • The tra & action id [ number ] default is 1.

  • Command object: the form of the key pair holds relevant information.

  • Optional: Generally not.

So, what can be stored in the command object.

  • App [ string ]: the name of the service connection used by the server. This is set based on your server settings. For example, live.

  • ] [ flashVer string: the ve & ion number of the flash player. Usually according to the model on your device. You can also set the default value: LNX 9,0,124,2.

  • ] [ tcurl string: the url address of the server. Simply, that's protocol://host/path. For example: rtmp:// .

  • Fpad [ macro ]: indicates whether the proxy is used. Usually false.

  • Audiocodecs [ number: audio decoding. The following will be introduced. Default can be set to 4071.

  • Videocodecs [ number: video decoding. Custom standard. Default can be set to 252.

  • ] number [ videofunction: indicates that special video functions are invoked on the server. Default can be set to 1

In short, command objects are the role of the route. To request specific resource paths. Actual data, you can refer to the results:

The above specific value is determined according to the official document. If you aren't tired of checking, you can use the above values directly. The above content is a higher compatibility value. When the package is successfully sent, the other end needs to get a return packet response, in particular:

  • ] [ command name string: for _result or _error.

  • The tra & action id [ number ] default is 1.

  • Command object: the form of the key pair holds relevant information.

  • ] [ information object: the form of the key pair to describe the associated response information. There are fields in the field: Level, code, description

You can refer to:

The location of the connect package, mainly after the handshake. As follows:


The call package main function is the program ( rpc, remote procedure call ) that's used to perform the remote execution of the data. However, there's no actual use of the in the process. Here's a brief introduction to the format. Its content is similar to connect:

  • Procedure name [ string ]: the name of the calling handler.

  • The transaction id [ number ]: if you want to return, we need to make a id. Otherwise 0.

  • Command object: the form of the key pair holds relevant information. AMF0/3

  • Optional: Generally not.

The contents of the command object are mainly targeted for programs, setting the associated calling parameters. Because the content isn't fixed, it isn't covered here.

The call is generally required to indicate whether the remote program is executed, and whether successful execution. The returned format is:

  • Command name [ string ]: based on the command object parameter in the call.

  • The transaction id [ number ]: if you want to return, we need to make a id. Otherwise 0.

  • Command object: the form of the key pair holds relevant information. AMF0/3

  • Response [ object ]: the result of the response


The createStream package is just used to tell the server that we're now creating a channel for communication. Formatting and content aren't complex:

  • Procedure name [ string ]: the name of the calling handler.

  • Transaction id [ number ]: create one. Can be set to 2

  • Command object: the form of the key pair holds relevant information. AMF0/3

When successful, the service server returns a _result or _error package description that receives success. The details are:

  • Command name [ string ]: based on the command object parameter in the call.

  • The transaction id [ number ]: if you want to return, we need to make a id. Otherwise 0.

  • Command object: the form of the key pair holds relevant information. AMF0/3. Generally null.

  • Stream id: returned stream id value.

Its return value is very random, reference to the content:

Next, let 's take a look at the second comparison of command msg -- in the.

Msg abstraction

There are a lot of msg in the, but in the live stream, it's more important than the play package. So here we focus on the play package.


The play package is mainly used to tell the server to play the audio stream formally. And, because of the natural nature of the. If the network has a corresponding fluctuation, the client can call the play command multiple times to switch the flow of different modes.

The basic format is:

  • Command name [ string ]: based on the command object parameter in the call.

  • The tra & action id [ number ] default is 0. Can also be set as other values

  • Command object: the field isn't required, in which case the default is set to Null

  • ] [ stream name string: used to specify the video file file to play. Because the is. It isn't necessary to add an extra identity to the file, just to specify the file name. For example:

StreamName: '6721_75994f92ce868a0cd3cc84600a97f75c'
  • However, if you want to support other files, additional presentation is required. Of course, audio and video require different support:

    • If you play audio files, such as mp3, you need additional prefix identifiers - mp3. For example, mp3:6721_75994f9.

    • If you're involved in video files, you needn't only prefix, but also suffix. For example, the file is identified as: mp4:6721_75994f9.mp4.

  • StartNumber: this field is actually a bit interesting. It can be divided into 3 classes: -2, -1,> = 0.

    • -2: if this identifier is the identifier, the server will first look for the corresponding livestream. No, I'll find record_stream. If not, the request is temporarily suspended until the next live_stream is obtained.

    • -1: only live_stream will play.

    • = 0: equivalent to seek video. It'll find the record_stream directly and determine the start time of the play based on the value of the field. If not, play the next video in the list.

  • Durationnumber: used to set the playback time. It also has several parameters that need to be explained, -1, 0,> 0.

    • -1: always play to live_stream or record_stream end.

    • 0: a segment of the frame is played. Usually.

    • 0: stream within specified duration will be played directly. If exceeded, the record_stream of the specified time period is played.

  • [ macro ]: this field isn't useful and can generally be ignored. Used to indicate that dropped the front playlist.

The entire play package has been covered. We can look at the actual play results:

The play package is sent in that link. Do you want to send the corresponding _result package after it's sent.

The play package is special and isn't required for _result. Because once the play package is successfully received. Server directly starts streamBegin operation.

The whole process is:

To this point, you can begin to formally receive video and audio streams.

Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs