Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Slow scan operations

See original GitHub issue

Summary:

I’m using dynamoose for my project, and overall it works well. However scan operations are several times slower than the same operations using AWS DocumentClient.

Code sample:

Schema

It’s the same with all my different schemas. Here is an example of the simplest one. None of them use Buffer type.

const userSchema = new dynamoose.Schema(
  {
    id: {
      type: String,
      required: true,
      hashKey: true,
    },
    phoneNumber: {
      type: String,
      index: {
        project: true,
        global: true,
        name: 'UserPhoneNumberIndex',
      },
    },
    name: String,
    status: String,
    email: String,
    // a few more fields with strings, numbers and booleans
  },
  { timestamps: true }
)

Model

const model = dynamoose.model('myTable', mySchema, {
    create: false,
    waitForActive: false,
  })

General

// dynamoose
MyModel.scan().all().exec()

// DocumentClient
export const scanTable = async <T = unknown>(
  table: string,
  extraParams?: Partial<AWS.DynamoDB.DocumentClient.ScanInput>
): Promise<T[]> => {
  const params: AWS.DynamoDB.DocumentClient.ScanInput = {
    TableName: `${constants.ENV}-${table}`,
    ...extraParams,
  }

  const scanResults = []
  let response: PromiseResult<
    AWS.DynamoDB.DocumentClient.ScanOutput,
    AWS.AWSError
  >
  do {
    // eslint-disable-next-line no-await-in-loop
    response = await getDocumentClient().scan(params).promise()
    scanResults.push(...response.Items)
    params.ExclusiveStartKey = response.LastEvaluatedKey
  } while (typeof response.LastEvaluatedKey !== 'undefined')

  return <T[]>scanResults
}

Current output and behavior (including stack trace):

Example 1: Scanning ~450 items that are ~1 kb each (according to aws dynamodb console). Using aws DocumentClient and scanning all: ~400 ms Using dynamoose scan all: ~4500 ms

Example 2: Scanning ~23000 items that are ~230 bytes each. Using aws DocumentClient and scanning all: ~6000 ms Using dynamoose scan all: ~24000 ms

Expected output and behavior:

Somewhat similar performance

Environment:

Operating System: Amazon Linux Operating System Version: 2 Node.js version (node -v): v14.x NPM version: (npm -v): 7.18.1 Dynamoose version: 2.8.1

Other information (if applicable):

AWS lambda using the Serverless framework.

Serverless version: 2.48.0 aws sdk version: 2.952.0 AWS_NODEJS_CONNECTION_REUSE_ENABLED: 1

Other:

[ x ] I have read through the Dynamoose documentation before posting this issue
[ x ] I have searched through the GitHub issues (including closed issues) and pull requests to ensure this issue has not already been raised before
[ x ] I have searched the internet and Stack Overflow to ensure this issue hasn’t been raised or answered before
[ x ] I have tested the code provided and am confident it doesn’t work as intended
[ x ] I have filled out all fields above
[ x ] I am running the latest version of Dynamoose

Issue Analytics

State:
Created 2 years ago
Reactions:6
Comments:14 (6 by maintainers)

Top GitHub Comments

1reaction

benhegartycommented, Oct 29, 2021

We were experiencing the same performance issues with a large query request. After upgrading to v3 alpha, performance is now on-par with the native client.

1reaction

FanFataLcommented, Oct 13, 2021

Hi,

I’ve got the same problem using Dynsamoose on AWS Lambda, on localhost ubuntu works perfectly.

After some investigation I have found what cause the issue: this line https://github.com/dynamoose/dynamoose/blob/a838224d031ba32db6f84d427600beda5ec765ed/lib/DocumentRetriever.ts#L59

Processing 150 records took 10s (sic!)