Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hooks are not adding more than 100 documents on afterCreate

See original GitHub issue

As it appears when adding a huge dataset to MeiliSearch it stops at 100 entries. As this is not the expected behavior it is a bug.

queue.on('ready', function () {
    queue.process(function (job, done) {
      console.log('processing job ' + job.id);
      console.log('job data page: ', job.data.page);
      const mockCtx = {
        params: { collection: job.data.collection },
        send: (e) => e,
        request: {},
      };
      fetchCollection(mockCtx, { page: job.data.page })
        .then((r) => addCollectionRows(r))
        .catch((e) => console.log('queue error', e));
      console.log('processed job ' + job.data.page);
      setTimeout(function () {
        done(null, job.data.id);
      }, 10);
    });

    console.log('listening for jobs and processing queued jobs...');
  });
};

@MANTENN , lets continue here!

Could you try the following:

Add your documents using the hooks
Bug appears! there are only 100 entries.
Using your meilisearch credentials try this route: https://docs.meilisearch.com/reference/api/updates.html#get-all-update-status

It should give back status on your document addition in MeiliSearch. Could you copy paste it here?

Also, is there a error thrown in the server logs?

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

MANTENNcommented, Apr 2, 2021

Without the queue it’s still only with 100 records because the Strapi model fetches the first 100 documents.

My code creates pagination and loops through “pages” and queues them with bee-que(backed by redis). I created a video as well of me running the plugin from the master branch without any modifications–may be too long.

https://youtu.be/IyXbypNt7q8

I created a new branch as requested. Saved the files without a linter. I will just paste the code here considering my linter runs with git hooks will take more time again to make an another repo.

/config/functions/bootstrap.js

const {
  fetchCollection,
  addCollectionRows,
} = require('../../controllers/meilisearch');

const meilisearch = {
  http: (client) => strapi.plugins.meilisearch.services.http(client),
  client: (credentials) =>
    strapi.plugins.meilisearch.services.client(credentials),
  store: () => strapi.plugins.meilisearch.services.store,
  lifecycles: () => strapi.plugins.meilisearch.services.lifecycles,
};

module.exports = async () => {
  const store = strapi.store({
    environment: strapi.config.environment,
    type: 'plugin',
    name: 'meilisearch_store',
  });
  strapi.plugins.meilisearch.store = store;

  await initHooks(store);

  const Queue = require('bee-queue');
  const queue = new Queue('ingestion', {
    redis: {
      host: '127.0.0.1',
      port: 6379,
      db: 0,
      options: {
        removeOnSuccess: true,
      },
    },
  });

  queue.on('ready', function () {
    queue.process(function (job, done) {
      console.log('processing job ' + job.id);
      console.log('job data page: ', job.data.page);
      const mockCtx = {
        params: { collection: job.data.collection },
        send: (e) => e,
        request: { body: { data: '' } },
      };
      fetchCollection(mockCtx, { page: job.data.page })
        .then((r) => addCollectionRows(r))
        .catch((e) => console.log('queue error', e));
      console.log('processed job ' + job.data.page);
      setTimeout(function () {
        done(null, job.data.id);
      }, 10);
    });

    console.log('listening for jobs and processing queued jobs...');
  });
};

controllers/meilisearch.js

 const Queue = require('bee-queue');
 const queue = new Queue('ingestion', {
   redis: {
     host: '127.0.0.1',
     port: 6379,
     db: 0,
     options: { removeOnSuccess: true },
   },
 });
 
 const meilisearch = {
  http: (client) => strapi.plugins.meilisearch.services.http(client),
  client: (credentials) => strapi.plugins.meilisearch.services.client(credentials),
  store: () => strapi.plugins.meilisearch.services.store,
  lifecycles: () => strapi.plugins.meilisearch.services.lifecycles
}

async function addCollectionRows(ctx) {
  const { collection } = ctx.params;
  const { data } = ctx.request.body;
  const credentials = await getCredentials();
  if (data.length > 0) {
    return meilisearch.http(meilisearch.client(credentials)).addDocuments({
      indexUid: collection,
      data,
    });
  } else {
    return await meilisearch.http(meilisearch.client(credentials)).createIndex({
      indexUid: collection,
    });
  }
}

async function fetchCollection(ctx, options = { page: 0 }) {
  try {
    const { collection } = ctx.params;

    if (!Object.keys(strapi.services).includes(collection)) {
      return { error: true, message: 'Collection not found' };
    }
    const perPage = 100;
    const rows = await strapi.services[collection].find({
      _publicationState: 'preview',
      _limit: perPage, // lock limit to keep the behaviour in case strapi updates this
      _start: options.page * perPage,
    });
    ctx.request.body = { data: rows };
    console.log('procesed', options.page);
    return ctx;
  } catch (e) {
    console.log(`fetchCollection${options ? '(queued):' : ':'}\n`, e);
    return ctx;
  }
}

async function addCollection(ctx) {
  try {
    const recordsCount = await strapi.services[ctx.params.collection].count({
      _publicationState: 'preview',
    });
    // default 100 are returned
    const pages = Math.ceil(recordsCount / 100);
    // page one because the first 100 are inserted below the loop right away
    // the others are queued to not take up too much performance
    for (let page = 1; page < pages; page++) {
      const job = queue.createJob({ collection: ctx.params.collection, page });
      job.save();
      // job.on('succeeded', (result) => {
      //   console.log(`Received result for job ${job.id}: ${result}`);
      // });
    }
    console.log('pages', pages);
  } catch (e) {
    console.log('addCollection:\n', e);
  }
  return addCollectionRows(await fetchCollection(ctx));
}


module.exports = {
  getCredentials: async (ctx) => sendCtx(ctx, getCredentials),
  addCollectionRows: async (ctx) => sendCtx(ctx, addCollectionRows),
  waitForDocumentsToBeIndexed: async (ctx) => sendCtx(ctx, waitForDocumentsToBeIndexed),
  getIndexes: async (ctx) => sendCtx(ctx, getIndexes),
  getCollections: async (ctx) => sendCtx(ctx, getCollections),
  addCollection: async (ctx) => sendCtx(ctx, addCollection),
  addCredentials: async (ctx) => sendCtx(ctx, addCredentials),
  deleteAllIndexes: async (ctx) => sendCtx(ctx, deleteAllIndexes),
  deleteIndex: async (ctx) => sendCtx(ctx, deleteIndex),
  UpdateCollections: async (ctx) => sendCtx(ctx, UpdateCollections),
  reload: async (ctx) => sendCtx(ctx, reload),
  fetchCollection
}

That’s for the queue

0reactions

bidoubiwacommented, Apr 2, 2021

Hello! I see 😃 Thanks for the explaination. This is a good suggestion. Could you maybe make an issue on https://github.com/meilisearch/meilisearch ?

You still need a queue to fetch records from the database to make sure the plugin isn’t taking up too much Strapi resources

Ah yes, big datasets may suffer from all these fetches! Your suggestion addresses this problem. Until your suggestion is created, we could think about some configuration where you can add some Redis credentials and activate the queuing of the fetching like you did.

Meanwhile! There is another solution. Since the big problem is when you first add your whole collection to meiliSearch, it is also possible to add the collection in a separate script. Where you would add all your rows to meilisearch using meilisearch-js for example:

import { MeiliSearch } from 'meilisearch'

;(async () => {
  const client = new MeiliSearch({
    host: 'http://127.0.0.1:7700',
    apiKey: 'masterKey',
  })

  // Fetch progressively all the rows of your collection from Strapi
  // Add them to MeiliSearch
  let response = await client.index("big_dataset").addDocuments(documents)

  console.log(response) // => { "updateId": 0 }
})()

Once this is done, if you used the same name as your collection in strapi (big_dataset in the code above), you do not need to add the collection as the checkbox will already be checked. Its not the perfect solution but we’ll get there 😃

Meanwhile, it should at least work on smaller datasets! I will make the PR

Top Results From Across the Web

Hooks

Hooks (also known as lifecycle events), are functions which are called before and after calls in sequelize are executed.

Understanding Vue.js Lifecycle Hooks

Lifecycle hooks allow you to know when your component is created, add… ... hook runs at the initialization of your component - data...

Mongoose v6.8.2: Middleware

Middleware (also called pre and post hooks) are functions which are passed control during execution of asynchronous functions. Middleware is specified on ...

Developing hooks - CloudFormation Command Line ...

Hooks proactively inspect the configuration of your AWS resources before provisioning. Hooks automatically inspect resources before they're provisioned.

Lifecycle Hooks

Each Vue component instance goes through a series of initialization steps when ... lifecycle hooks, giving users the opportunity to add their own...