HTTP Streaming (or Chunked vs Store & Forward)

Home   »   HTTP Streaming (or Chunked vs Store & Forward)

HTTP Streaming (or Chunked vs Store & Forward)
==============================================

The standard way of understanding the HTTP protocol is via the request reply 
pattern. Each HTTP transaction consists of a finitely bounded HTTP request and 
a finitely bounded HTTP response.

However it's also possible for both parts of an HTTP 1.1 transaction to stream 
their possibly infinitely bounded data. The advantages is that the sender can 
send data that is beyond the sender's memory limit, and the receiver can act on 
the data stream in chunks immediately instead of waiting for the entire data to 
arrive. Basically you're either saving space or you're saving time. The 
advantages of streaming is elaborated in Wikipedia's [Online algorithm article](https://en.wikipedia.org/wiki/Online_algorithm).

Note that HTTP streaming is only involves the HTTP protocol and not websockets. 
Streaming is also the basis for HTML5 server sent events.

So we're going to look at HTTP streaming architecture, and how to achieve 
streaming in a few different languages.

The first thing to understand is that HTTP streaming involves streaming within 
a single HTTP transaction. In a larger context, each HTTP transaction itself 
represents an event as part of a larger event stream. This reveals to us that 
the concepts of "streaming" is a context-specific concept, it's relative to what 
we consider the "stream" to be.

Firstly we have to consider the HTTP headers that supports streaming. Open this 
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:

## Content-Length ##

The `Content-Length` header determines the byte length of the request/response 
body. If you neglect to specify the `Content-Length` header, HTTP servers will 
implicitly add a `Transfer-Encoding: chunked` header. The `Content-Length` and 
`Transfer-Encoding` header should not be used together. The receiver will have no 
idea what the length of the body is and cannot estimate the download completion 
time. If you do add a `Content-Length` header, make sure it matches the entire 
body in bytes, if it is incorrect, the behaviour of receivers is undefined.

The `Content-Length` header will not allow streaming, but it is useful for large 
binary files, where you want to support partial content serving. This basically 
means resumable downloads, paused downloads, partial downloads, and multi-homed 
downloads. This requires the use of an additional header called `Range`. This 
technique is called [Byte serving](https://en.wikipedia.org/wiki/Byte_serving).

## Transfer-Encoding ##

The use of `Transfer-Encoding: chunked` is what allows streaming within a single 
request or response. This means that the data is transmitted in a chunked manner, 
and does not impact the representation of the content.

Officially an HTTP client is meant to send a request with a `TE` header field that 
specifies what kinds of transfer encodings the client is willing to accept. This is 
not always sent, however most servers assume that clients can process `chunked` 
encodings.

The chunked transfer encoding makes better use of persistent TCP connections, which 
HTTP 1.1 assumes to be true by default.

Chunked data is represented in this manner: 

```
4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n
```

Each chunk starts with its byte length expressed as a hexadecimal number followed by 
optional parameters (chunk extension) and a terminating CRLF sequence, followed by 
the chunk data. The final chunk is terminated by a CRLF sequence.

Chunk extensions can be used to indicate a message digest or an estimated progress. 
They are just custom metadata that your layer 7 receiver needs to parse. There's no 
standardised format for it. Because of this, it's probably better to just add your 
metadata (if any) into the chunk itself for your layer 7.5 application to parse.

For your application to send out chunked data, you must first send out the 
`Transfer-Encoding` header, and then you must flush content in chunks according to 
the chunk format. If you don't have an appropriate HTTP server that handles this, then 
you need to implement the syntax generator yourself. Sometimes you can use a library 
to provide an abstract interface.

For example in PHP, there's the [Symfony HTTP Foundation Stream Response](http://symfony.com/doc/current/components/http_foundation/introduction.html#streaming-a-response) 
and in NodeJS, it's [native HTTP module chunks all responses](https://nodejs.org/api/http.html#http_response_write_chunk_encoding_callback).

Chunking is a 2 way street. The HTTP protocol allows the client to chunk HTTP 
requests. This allows the client to stream the HTTP request. Which is useful for 
uploading large files. However not many servers (except NGINX) support this feature, 
and most streaming upload implementations rely on Javascript libraries to cut up a 
binary file and send it by chunks to the server. Using Javascript gives you more 
control over the uploading experience, but the HTTP protocol would be the most simplest.

Browsers natively support chunked data. So if your server sends chunked data, they 
will start rendering data as soon as they receive it. However there's a buffer limit 
that browsers need to receive before it starts rendering them. This is different for 
each browser, but generally it's 1KB. You can see the limits for various browsers 
here: http://stackoverflow.com/a/16909228/582917

If however you want to consume an API that supports streaming, you need to be aware of 
how your HTTP library handles chunked data. In most cases, you'll need to attach a 
callback handler that executes upon each chunk of data. This should mean that your 
API will need to frame each chunk in a useful manner. If the API is doing too many 
chunks, you may end up needing to buffer the data up into a "semantic protocol data 
unit" (PDU) before you can work on it. This of course defeats the purpose of chunking 
in the first place. For example in PHP, you can use the [Guzzle library or `curl`](http://mtdowling.com/blog/2012/01/27/chunked-encoding-in-php-with-guzzle/).

In considering performance, you want to make sure that you're not producing way 
too chunky data. The more "chunking" you do, the more overhead that exists in both 
producing the chunks and parsing the chunks. Furthermore, it also results in more 
executions of buffering functions if the receiver can't make immediate use of the 
chunks. Chunking isn't always the right answer, it adds extra complexity on the 
recipient. So if you're sending small units of things that won't gain much from 
streaming, don't bother with it!

Do note that byte serving is compatible with chunked encoding, this would be applicable 
where you know the total content length, want to allow partial or resumable downloads, 
but you want to stream each partial response to the client.

## Content-Encoding ##

It is also possible to compress chunked or non-chunked data. This is practically 
done via the `Content-Encoding` header.

Note that the `Content-Length` is equal to the length of the body after the 
`Content-Encoding`. This means if you have gzipped your response, then the length 
calculation happens after compression. You will need to be able to load the entire 
body in memory if you want to calculate the length (unless you have that information 
elsewhere).

When streaming using chunked encoding, the compression algorithm must also support 
online processing. Thankfully, gzip supports stream compression. I believe that 
the content gets compressed first, and then cut up in chunks. That way, the chunks 
are received, then decompressed to acquire the real content. If it were the other 
way around, you'll get the compressed stream, and then decompressing would give us 
chunks. Which doesn't make sense.

A typical compressed stream response may have these headers:

```
Content-Type: text/html
Content-Encoding: gzip
Transfer-Encoding: chunked
```

Semantically the usage of `Content-Encoding` indicates an "end to end" encoding 
scheme, which means only the final client or final server is supposed to decode the 
content. Proxies in the middle are not suppose to decode the content.

If you want to allow proxies in the middle to decode the content, the correct header 
to use is in fact the `Transfer-Encoding` header. If the HTTP request possessed a 
`TE: gzip chunked` header, then it is legal to respond with `Transfer-Encoding: gzip chunked`.

However this is very rarely supported. So you should only use `Content-Encoding` 
for your compression right now.

## Buffering Problem ##

The biggest problem when implementing HTTP streaming is understanding the effect of 
buffering. Buffering is the practice of accumulating reads or writes into a temporary 
fixed memory space. The advantages of buffering include reducing read or write call 
overhead. For example instead of writing 1KB 4096 times, you can just write 4096KB at 
once. This means your program can create a write buffer holding 4096KB of temporary 
data (which can be aligned to the disk blocksize), and once the space limit is reached, 
the buffer is flushed to disk. 

Typical HTTP architectures include these components:

```
Client <--> Proxy <--> HTTP Server <--> Application Server <--> Database Server
```

Each one of these components can possess adjustable and varied buffering styles and 
limits.

To correct perform streaming, you have to know and adjust the buffering limits at 
each component.

For example, let's invesigate the typical PHP stack such as:

```
Browser <--> Proxy <--> NGINX <--> PHP <--> MySQL
```

### The Client ###

Firstly browsers have a [rendering buffer limit](http://stackoverflow.com/a/16909228/582917). 
You must send as much data as the limit before the browsers will render the content. 
Having chunks smaller than the buffer will just make the browser hold the data until 
either the buffer is full or when the connection is closed (or after some time limit).

### The Proxies ###

At the proxy level, this could be your ISP or some custom proxy. If the proxy buffers data 
this means, your streamed data from upstream will be stored up the proxy buffer before 
sending to the browser. Some mobile wireless ISP will buffer things and you won't be able 
to control this behaviour, this is a violation of the [end to end principle](https://en.wikipedia.org/wiki/End-to-end_principle), 
so there's nothing here you can do technically.

### The Web Server ###

At the NGINX level, buffering is dependent upon the type of the upstream connection. There 
are 3 common connection types for HTTP: "proxy", "uwsgi", "fastcgi". If you want your NGINX 
server to respect streaming, you can either switch off buffering for your connection type, or 
match the buffer size with the upstream chunk size. Switching off buffering can be done 
using a buffering directive (`proxy_buffering`, `uwsgi_buffering`, `fastcgi_buffering`), or 
you can use a special header `X-Accel-Buffering: no` which tells NGINX to not buffer the 
response. The special header is more flexible, as this allows NGINX to buffer responses that 
don't need streaming. It also works for all 3 connection types.

If you instead try to match the buffer size with the chunk size, you have to make sure that 
the number of buffers multiplied by the buffer size (equal to a system memory page) is equal 
to a single chunk size. If it is greater than a single chunk from upstream, then this means 
your chunks will be accumulated before they are sent downstream. If it is less than the 
chunk size, this would result in NGINX buffering to disk, you want to avoid this as this 
results in extra overhead when streaming. For more information on [buffer size see this gist](https://gist.github.com/magnetikonline/11312172).

Just a note on buffering optimisation: the larger the total buffer size, the greater 
likelihood of each connection using more memory. This is because if each buffer is large, 
there's a chance that you may not be efficiently using the buffer which can cause 
[memory fragmentation](https://en.wikipedia.org/wiki/Fragmentation_%28computing%29). In 
the end, each buffer size should match the system memory page size. The number of buffers 
is what can be dynamically allocated. If your total buffer size across all connections 
exceeds your OS's memory limit, you're either going to meet an OOM error or starting paging 
to disk. To maintain your NGINX's availability, you have to consider the theoretical 
number of connections that a single NGINX server can handle, before it exhausts your server's 
memory limit.

Be aware of the real chunk size after compression. If your upstream is compressing the content, 
the resulting chunk size will be different. In most cases, NGINX should be doing the compression 
and it does support compressing for chunk that arrives from upstream. You just need `gzip on`. 
This means your application layer should not be compressing or chunking the content, it should 
just flush raw data. NGINX is smart enough to understand and will automatically compress each 
received upstream data, and then format it into chunks, which is then flushed to downstream.

There's an advantage in keeping buffers available or having a larger buffer size than the 
chunk size. It comes from dealing with slow clients. NGINX as a reverse proxy is very fast 
and can read the response from your upstream application server very quickly. NGINX itself 
can deal with any slow browsers that has a slower read rate than your upstream's write rate. 
Because NGINX is very light weight (asynchronous IO), the cost of holding a connection in 
NGINX is far smaller than holding open a process (that is waiting for the client to finish 
reading) in your application server. This is of course relative, as your application server 
might also be very light weight, and rely on either green threads or asynchronous IO. This 
problem does reveal an interesting property of streaming systems. Any stream will only be as 
quick as the slowest link (reader or writer) in the chain. This problem with streaming is 
related to [network back pressure issue in distributed systems](http://engineering.voxer.com/2013/09/16/backpressure-in-nodejs/).

To take advantage of NGINX's ability of handling slow clients while still streaming data as 
fast as possible, there will need to be some tuning of both the buffer size and potentially the 
`*_busy_buffer_size` option. You cannot just increase the total buffer size, as that will 
just make NGINX wait until the buffer is full. What you need is some buffer size that is 
allocated only for slow clients. This has something to do with the `*_busy_buffer_size`, but 
this is poorly documented currently, so I do not know how make this work.

Here are 2 quotes about the `*_busy_buffer_size`:

> When buffering of responses from the * server is enabled, limits the total size of buffers that can be busy sending a response to the client while the response is not yet fully read. In the meantime, the rest of the buffers can be used for reading the response and, if needed, buffering part of the response to a temporary file. By default, size is limited by the size of two buffers set by the *_buffer_size and *_buffers directives. 
> 
> - NGINX documentation

> proxy_busy_buffers_size: This directive sets the maximum size of buffers that can be marked "client-ready" and thus busy. While a client can only read the data from one buffer at a time, buffers are placed in a queue to send to the client in bunches. This directive controls the size of the buffer space allowed to be in this state.
> 
> - https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching

### The Application Server ###

At the PHP level, global buffers can be set inside the `php.ini` configuration file. There are 
3 options defined `output_buffering`, `output_handler` and `implicit_flush`. They 
are explained in the [output control section of the PHP documentation](http://php.net/manual/en/outcontrol.configuration.php).
It is interesting to note that for CLI applications, the output buffering is off by default. 
This is so that your CLI application can show you results as its running. This buffer is controlled 
by the server application programming interface "SAPI". You can control inside your application by 
calling `flush()`, which will flush the entire SAPI buffer.

During runtime, custom buffers can also be created using `ob_start()`. Once you have added content 
to the buffer, you can then flush your custom buffer using `ob_flush()`. This only flushes the buffer 
that you created using `ob_start()`. Think of the `ob_start()` as a kind of PHP specific manual 
memory management. You're basically asking for some block of memory (fixed or variable), which you 
then can only use for your output statements and functions: `echo` and `print`.

If you have entered both levels of buffers, you need call the flush functions in this order: 
`ob_flush(); flush();`.

Both the global SAPI buffer and the custom application buffer have settings that enable automatic 
flushing. This can depend on hitting the buffer limit, or on some function call. Check the 
documentation for more.

### The Upstream Data Source ###

Finally we reach the MySQL level. This can be replaced with any upstream data source that you 
are calling in order to prepare a response. By default all SQL queries are buffered. There are 
2 options to achieve unbuffered queries (writes and reads). The first is the [unbuffered query 
option](http://us.php.net/manual/en/mysqlinfo.concepts.buffering.php). This allows one to work 
with reading large result sets, and to process each row as it arrives (including flushing to the 
client).The second option works with just one single column of data. This is useful where a single 
column contains a large binary or textual content, and you want to be able to work with a stream 
on this data specifically. This involves the usage of the [large object option](http://php.net/manual/en/pdo.lobs.php). You can also stream write a large binary or textual content into the database using large 
object option. The streaming of writing rows is just done by running multiple insert queries.

With regards to the second method, there are some peculiarities you have to keep in mind: 
https://www.percona.com/blog/2007/07/06/php-large-result-sets-and-summary-tables/

## A Note About NodeJS ##

NodeJS has great support for streaming. In fact its entire native HTTP module does streaming by 
default for both incoming requests and outgoing responses. Everytime you call `response.writeHead` or 
`response.write`, it is just writing a chunk of data. However there may be a buffer size inside 
NodeJS which is probably the `highWaterMark` setting. However I have not looked into this further.

NodeJS has a native stream module: https://nodejs.org/api/stream.html that serves as a base object 
for all other IO modules.

Leave a Reply

Your email address will not be published. Required fields are marked *