question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

View first page before entire document is loaded - support range header

See original GitHub issue

Before you start - checklist

  • I have read documentation in README
  • I have checked sample and test suites to see real life basic implementation
  • I have checked if this question is not already asked

What are you trying to achieve? Please describe.

In our project (issue, demo), I’d like to load only the pages that I’m viewing, and render the first page before the entire document is loaded.

From my understanding, PDF.js supports Range headers and the react-pdf API describes that it’s possible to include a PDFDataRangeTransport object in the file property. I fail to see what to do to actually send these Range headers, though!

Describe solutions you’ve tried

  • Check if the source PDF is optimized for the web
  • Check if the hosting service supports HTTP Range headers

Environment

  • Chrome 75
  • MacOS 10.14.5
  • React-PDF 4.0.5
  • React-scripts 3.0.1
  • React 16.8.6

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
joepiocommented, Aug 5, 2019

According to PDF.js developers, PDF.js does not support gzip encoding of range responses, so it needs to be set explicitly. According to the PDF.js docs, you can set custom headers. Since Document passes the options object to PDFjs.getDocument, this should work:

<Document
  options={{
    httpHeaders: {
      'Accept-Encoding': 'Identity',
    }
  }}
  file={"https://example.com/some.pdf"}
>

However, it does not, so I’m still investigating what is going on. It seems likely that it’s a pdf.js issue.

1reaction
angel-langdoncommented, Mar 1, 2022

@joepio Well I finally managed to do it, it was failing because our backend implementation was not compatible with pdf.js

Frontend component

interface MemoizedDocumentProps {
  url: string;
  children: JSX.Element | null;
}

const MemoizedDocument = memo((props: MemoizedDocumentProps) => {
  const file = useMemo(
    () => ({ url: props.url }),
    [props.url]
  );
  return (
    <Document
      file={file}
    >
      {props.children}
    </Document>
  );
});

Backend implementation (in Python)

import os
from typing import BinaryIO

from fastapi import FastAPI, HTTPException, Request, status
from fastapi.responses import StreamingResponse


def send_bytes_range_requests(
    file_obj: BinaryIO, start: int, end: int, chunk_size: int = 10_000
):
    """Send a file in chunks using Range Requests specification RFC7233

    `start` and `end` parameters are inclusive due to specification
    """
    with file_obj as f:
        f.seek(start)
        while (pos := f.tell()) <= end:
            read_size = min(chunk_size, end + 1 - pos)
            yield f.read(read_size)


def _get_range_header(range_header: str, file_size: int) -> tuple[int, int]:
    def _invalid_range():
        return HTTPException(
            status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE,
            detail=f"Invalid request range (Range:{range_header!r})",
        )

    try:
        h = range_header.replace("bytes=", "").split("-")
        start = int(h[0]) if h[0] != "" else 0
        end = int(h[1]) if h[1] != "" else file_size - 1
    except ValueError:
        raise _invalid_range()

    if start > end or start < 0 or end > file_size - 1:
        raise _invalid_range()
    return start, end


def range_requests_response(
    request: Request, file_path: str, content_type: str
):
    """Returns StreamingResponse using Range Requests of a given file"""

    file_size = os.stat(file_path).st_size
    range_header = request.headers.get("range")

    headers = {
        "content-type": content_type,
        "accept-ranges": "bytes",
        "content-encoding": "identity",
        "content-length": str(file_size),
        "access-control-expose-headers": (
            "content-type, accept-ranges, content-length, "
            "content-range, content-encoding"
        ),
    }
    start = 0
    end = file_size - 1
    status_code = status.HTTP_200_OK

    if range_header is not None:
        start, end = _get_range_header(range_header, file_size)
        size = end - start + 1
        headers["content-length"] = str(size)
        headers["content-range"] = f"bytes {start}-{end}/{file_size}"
        status_code = status.HTTP_206_PARTIAL_CONTENT

    return StreamingResponse(
        send_bytes_range_requests(open(file_path, mode="rb"), start, end),
        headers=headers,
        status_code=status_code,
    )


app = FastAPI()


@app.get("/video")
def get_video(request: Request):
    return range_requests_response(
        request, file_path="path_to_my_video.mp4", content_type="video/mp4"
    )

I would strongly recommend reading the Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233 to understand everything, there are a few gotchas

Read more comments on GitHub >

github_iconTop Results From Across the Web

HTTP range requests - MDN Web Docs - Mozilla
An HTTP range request asks the server to send only a portion of an HTTP message back to a client. Range requests are...
Read more >
How to get byte range of PDF pages? - Stack Overflow
I'm currently using XAMPP on my system, and I'm not sure if XAMPP supports range requests (to test), although the site will be...
Read more >
How CloudFront processes partial requests for an object ...
If the origin doesn't support Range GET requests: It returns the entire object. CloudFront serves the current request by sending the entire object...
Read more >
Progressive loading - MuPDF
The idea of progressive loading is that as you download a PDF file into a ... the first page of a document in...
Read more >
Page Speed Metrics - FullStory Support
More specifically, the DOMContentLoaded fires when the initial HTML document has been loaded and parsed. Often this milestone occurs before stylesheets, images, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found