Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

View first page before entire document is loaded - support range header

See original GitHub issue

Before you start - checklist

I have read documentation in README
I have checked sample and test suites to see real life basic implementation
I have checked if this question is not already asked

What are you trying to achieve? Please describe.

In our project (issue, demo), I’d like to load only the pages that I’m viewing, and render the first page before the entire document is loaded.

From my understanding, PDF.js supports Range headers and the react-pdf API describes that it’s possible to include a PDFDataRangeTransport object in the file property. I fail to see what to do to actually send these Range headers, though!

Describe solutions you’ve tried

Check if the source PDF is optimized for the web
Check if the hosting service supports HTTP Range headers

Environment

Chrome 75
MacOS 10.14.5
React-PDF 4.0.5
React-scripts 3.0.1
React 16.8.6

Issue Analytics

State:
Created 4 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

7reactions

joepiocommented, Aug 5, 2019

According to PDF.js developers, PDF.js does not support gzip encoding of range responses, so it needs to be set explicitly. According to the PDF.js docs, you can set custom headers. Since Document passes the options object to PDFjs.getDocument, this should work:

<Document
  options={{
    httpHeaders: {
      'Accept-Encoding': 'Identity',
    }
  }}
  file={"https://example.com/some.pdf"}
>

However, it does not, so I’m still investigating what is going on. It seems likely that it’s a pdf.js issue.

1reaction

angel-langdoncommented, Mar 1, 2022

@joepio Well I finally managed to do it, it was failing because our backend implementation was not compatible with pdf.js

Frontend component

interface MemoizedDocumentProps {
  url: string;
  children: JSX.Element | null;
}

const MemoizedDocument = memo((props: MemoizedDocumentProps) => {
  const file = useMemo(
    () => ({ url: props.url }),
    [props.url]
  );
  return (
    <Document
      file={file}
    >
      {props.children}
    </Document>
  );
});

Backend implementation (in Python)

import os
from typing import BinaryIO

from fastapi import FastAPI, HTTPException, Request, status
from fastapi.responses import StreamingResponse


def send_bytes_range_requests(
    file_obj: BinaryIO, start: int, end: int, chunk_size: int = 10_000
):
    """Send a file in chunks using Range Requests specification RFC7233

    `start` and `end` parameters are inclusive due to specification
    """
    with file_obj as f:
        f.seek(start)
        while (pos := f.tell()) <= end:
            read_size = min(chunk_size, end + 1 - pos)
            yield f.read(read_size)


def _get_range_header(range_header: str, file_size: int) -> tuple[int, int]:
    def _invalid_range():
        return HTTPException(
            status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE,
            detail=f"Invalid request range (Range:{range_header!r})",
        )

    try:
        h = range_header.replace("bytes=", "").split("-")
        start = int(h[0]) if h[0] != "" else 0
        end = int(h[1]) if h[1] != "" else file_size - 1
    except ValueError:
        raise _invalid_range()

    if start > end or start < 0 or end > file_size - 1:
        raise _invalid_range()
    return start, end


def range_requests_response(
    request: Request, file_path: str, content_type: str
):
    """Returns StreamingResponse using Range Requests of a given file"""

    file_size = os.stat(file_path).st_size
    range_header = request.headers.get("range")

    headers = {
        "content-type": content_type,
        "accept-ranges": "bytes",
        "content-encoding": "identity",
        "content-length": str(file_size),
        "access-control-expose-headers": (
            "content-type, accept-ranges, content-length, "
            "content-range, content-encoding"
        ),
    }
    start = 0
    end = file_size - 1
    status_code = status.HTTP_200_OK

    if range_header is not None:
        start, end = _get_range_header(range_header, file_size)
        size = end - start + 1
        headers["content-length"] = str(size)
        headers["content-range"] = f"bytes {start}-{end}/{file_size}"
        status_code = status.HTTP_206_PARTIAL_CONTENT

    return StreamingResponse(
        send_bytes_range_requests(open(file_path, mode="rb"), start, end),
        headers=headers,
        status_code=status_code,
    )


app = FastAPI()


@app.get("/video")
def get_video(request: Request):
    return range_requests_response(
        request, file_path="path_to_my_video.mp4", content_type="video/mp4"
    )

I would strongly recommend reading the Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233 to understand everything, there are a few gotchas

Top Results From Across the Web

HTTP range requests - MDN Web Docs - Mozilla

An HTTP range request asks the server to send only a portion of an HTTP message back to a client. Range requests are...

How to get byte range of PDF pages? - Stack Overflow

I'm currently using XAMPP on my system, and I'm not sure if XAMPP supports range requests (to test), although the site will be...

How CloudFront processes partial requests for an object ...

If the origin doesn't support Range GET requests: It returns the entire object. CloudFront serves the current request by sending the entire object...

Progressive loading - MuPDF

The idea of progressive loading is that as you download a PDF file into a ... the first page of a document in...

Page Speed Metrics - FullStory Support

More specifically, the DOMContentLoaded fires when the initial HTML document has been loaded and parsed. Often this milestone occurs before stylesheets, images, ...