question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Potential for significant perf improvements in large repos

See original GitHub issue

First, thanks for your work on this project šŸ™‚

The current implementation is fairly slow with large repos, for instance vscode, which has around 5000 files or typescript, which has around 50k. It takes about a minute to clone vscode with --singleBranch and --depth 1, and doesnā€™t manage to clone typescript in the ~15 minutes I waited.

By adding batching to the indexdb writes (Put all writes into a single transaction rather than one transaction per file) and changing the autoinc in the cachefs to increment a counter rather than search for the highest inode (the search means writing N files is N^2 time), I am able to see vscode clone in ~20 seconds and typescript clone in about 2 minutes. this is approx 3x slower than native for vscode and 6x slower than native for typescript.

Batching:

diff --git a/idb-keyval.ts b/idb-keyval.ts
index 45a0d97..94920ef 100644
--- a/idb-keyval.ts
+++ b/idb-keyval.ts
@@ -2,10 +2,12 @@ export class Store {
   private _dbp: Promise<IDBDatabase> | undefined;
   readonly _dbName: string;
   readonly _storeName: string;
+  readonly id: string
 
   constructor(dbName = 'keyval-store', readonly storeName = 'keyval') {
     this._dbName = dbName;
     this._storeName = storeName;
+    this.id = `dbName:${dbName};;storeName:${storeName}`
     this._init();
   }
 
@@ -44,6 +46,31 @@ export class Store {
   }
 }
 
+class Batcher<T> {
+  private ongoing: Promise<void> | undefined
+  private items: { item: T, onProcessed: () => void }[] = []
+
+  constructor(private executor: (items: T[]) => Promise<void>) { }
+
+  private async process() {
+    const toProcess = this.items;
+    this.items = [];
+    await this.executor(toProcess.map(({ item }) => item))
+    toProcess.map(({ onProcessed }) => onProcessed())
+    if (this.items.length) {
+      this.ongoing = this.process()
+    } else {
+      this.ongoing = undefined
+    }
+  }
+
+  async queue(item: T): Promise<void> {
+    const result = new Promise<void>((resolve) => this.items.push({ item, onProcessed: resolve }))
+    if (!this.ongoing) this.ongoing = this.process()
+    return result
+  }
+}
+
 let store: Store;
 
 function getDefaultStore() {
@@ -58,10 +85,17 @@ export function get<Type>(key: IDBValidKey, store = getDefaultStore()): Promise<
   }).then(() => req.result);
 }
 
+const setBatchers: Record<string, Batcher<{ key: IDBValidKey, value: any }>> = {}
 export function set(key: IDBValidKey, value: any, store = getDefaultStore()): Promise<void> {
-  return store._withIDBStore('readwrite', store => {
-    store.put(value, key);
-  });
+  if (!setBatchers[store.id]) {
+    setBatchers[store.id] = new Batcher((items) =>
+      store._withIDBStore('readwrite', store => {
+        for (const item of items) {
+          store.put(item.value, item.key)
+        }
+      }))
+  }
+  return setBatchers[store.id].queue({ key, value })
 }
 
 export function update(key: IDBValidKey, updater: (val: any) => any, store = getDefaultStore()): Promise<void> {

Counter:

diff --git a/src/CacheFS.js b/src/CacheFS.js
index ed26c57..0dc6950 100755
--- a/src/CacheFS.js
+++ b/src/CacheFS.js
@@ -5,6 +5,7 @@ const STAT = 0;
 
 module.exports = class CacheFS {
   constructor() {
+    this._maxInode = 0
   }
   _makeRoot(root = new Map()) {
     root.set(STAT, { mode: 0o777, type: "dir", size: 0, ino: 0, mtimeMs: Date.now() });
@@ -38,16 +39,7 @@ module.exports = class CacheFS {
     return count;
   }
   autoinc () {
-    let val = this._maxInode(this._root.get("/")) + 1;
-    return val;
-  }
-  _maxInode(map) {
-    let max = map.get(STAT).ino;
-    for (let [key, val] of map) {
-      if (key === STAT) continue;
-      max = Math.max(max, this._maxInode(val));
-    }
-    return max;
+    return ++this._maxInode;
   }
   print(root = this._root.get("/")) {
     let str = "";

Please let me know if youā€™d consider incorporating these changesā€¦ the batching should be safe, Iā€™m not super sure about the autoinc, but I donā€™t see a reason why it would cause issues (the main difference is deleting a file would free up its inode value in the original implementation but doesnā€™t here, but that shouldnā€™t be a problem AFAIK)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ncortinescommented, Dec 8, 2020

Another thing to watch out for is memory footprint, specially when dealing with fast emitting sources.

I think the File System Access API adoption (#28), plus support for writable streams, would bring an important leap in performance.

I would personally put my efforts on that front šŸ˜ƒ

1reaction
JacksonKearlcommented, Dec 8, 2020

@wmhilton no unfortunately we opted for a different route (downloading via archive endpoints) and are no longer using this project.

The code sample I included in the original post (and below) has an implementation of batching with no extra cost for single operations. Basically you send off operations immediately if there are none in progress, or batch them together if there are some in progress, and send them all off when the in progress batch finishes. Itā€™s the kind of thing that youā€™d think the DB would do for you but I guess not yet.

Itā€™s interesting to note that in my testing when firing a sequence of N single put requests the first one wont resolve until the last one is fired, so there is still some sort of batching going on, but just much less efficiently than grouping all changes into a single transaction.

The Batcher class:

class Batcher<T> {
  private ongoing: Promise<void> | undefined
  private items: { item: T, onProcessed: () => void }[] = []

  constructor(private executor: (items: T[]) => Promise<void>) { }

  private async process() {
    const toProcess = this.items;
    this.items = [];
    await this.executor(toProcess.map(({ item }) => item))
    toProcess.map(({ onProcessed }) => onProcessed())
    if (this.items.length) {
      this.ongoing = this.process()
    } else {
      this.ongoing = undefined
    }
  }

  async queue(item: T): Promise<void> {
    const result = new Promise<void>((resolve) => this.items.push({ item, onProcessed: resolve }))
    if (!this.ongoing) this.ongoing = this.process()
    return result
  }
}

(this is roughly inspired by the Throttler class used extensively internally in vscode)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Git Plugin Performance Improvement: Final Phase and Release
Major performance improvements. Major performance enhancements. Building Tensorflow (~800 MiB) using a Jenkins pipeline, there is over 50%Ā ...
Read more >
Exploring new frontiers for Git push performance
Today, I want to focus on the ā€œEnumerating Objectsā€ phase. Specifically, there is a computation that happens before any progress is output atĀ ......
Read more >
How to handle big repositories with Git | Atlassian Git Tutorial
Learn about the major reasons behind Git repositories becoming too large and techniques to manage these repositories, from submodules to Git LFS.
Read more >
Issues after trying to repack a git repo for improved performance
Probably the size of your repo and your low value for pack. ... packSizeLimit would affect memory, but I don't believe it would...
Read more >
Repository-Level Prompt Generation for Large Language ...
With the success of large language models (LLMs) of code and their use ... we can achieve significant performance gains over Codex and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found