question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FasterLog based Pubsub hangs if log file deleted

See original GitHub issue

In the code below, I expect that after I delete the log file and restart the producer and consumer that my consumer would obviously miss everything stored in the log file in iteration 1, but it should be able to consume the messages produced during the 2nd iteration. However, it seems like the consumer completely hangs and has no way to get out. Even if I wire up cancellation tokens, the iterations just keep on cancelling out at the iteration.WaitAsync() call. How can we get out of this state?

Version: 1.9.10

namespace FasterLogPlayground
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Runtime.CompilerServices;
    using System.Text;
    using System.Threading.Tasks;
    using FASTER.core;

    internal class Program
    {
        private static IDevice device;
        private static FasterLog log;
        private static FasterLogScanIterator iterator;

        static async Task Main(string[] args)
        {
            // Produce some logs and consume them partially
            device = Devices.CreateLogDevice("hlog.log", useIoCompletionPort: true);
            log = new FasterLog(new FasterLogSettings { LogDevice = device, MemorySizeBits = 26, PageSizeBits = 20, MutableFraction = 0.5, SegmentSizeBits = 20 });
            iterator = log.Scan(log.BeginAddress, long.MaxValue, "logtest1", true, ScanBufferingMode.DoublePageBuffering, true);

            TaskCompletionSource<bool> firstIterationTaskCompletionSource = new TaskCompletionSource<bool>();
            Task.Run(() => CommitterAsync(log, TimeSpan.FromMilliseconds(100), firstIterationTaskCompletionSource));
            var tasks = new List<Task>();
            tasks.Add(ProducerAsync(log, "Message ", 100));
            tasks.Add(ConsumerAsync(iterator, log, "Consumer1", 10));
            await Task.WhenAll(tasks).ConfigureAwait(false);
            firstIterationTaskCompletionSource.SetResult(true);

            // Give some delay for commit to complete
            await Task.Delay(500).ConfigureAwait(false);

            iterator.Dispose();
            log.Dispose();
            device.Dispose();

            // Delete the log file
            File.Delete("hlog.log.0");

            // Now try to produce and consume again.
            // Expected behavior: My expectation is that we will recover and start to consume whatever is produced starting at this point.
            // Actual behavior: The producer produces everything, but consumer is stuck forever
            device = Devices.CreateLogDevice("hlog.log", useIoCompletionPort: true);
            log = new FasterLog(new FasterLogSettings { LogDevice = device, MemorySizeBits = 26, PageSizeBits = 20, MutableFraction = 0.5, SegmentSizeBits = 20 });
            iterator = log.Scan(log.BeginAddress, long.MaxValue, "logtest1", true, ScanBufferingMode.DoublePageBuffering, true);

            TaskCompletionSource<bool> secondIterationTaskCompletionSource = new TaskCompletionSource<bool>();
            Task.Run(() => CommitterAsync(log, TimeSpan.FromMilliseconds(100), secondIterationTaskCompletionSource));
            tasks = new List<Task>();
            tasks.Add(ProducerAsync(log, "AnotherMessage ", 100));
            tasks.Add(ConsumerAsync(iterator, log, "Consumer2", 10));
            await Task.WhenAll(tasks).ConfigureAwait(false);
            secondIterationTaskCompletionSource.SetResult(true);

            Console.WriteLine("Done");
        }

        static async Task CommitterAsync(FasterLog log, TimeSpan delay, TaskCompletionSource<bool> tcs)
        {
            while (!tcs.Task.IsCompleted)
            {
                await Task.Delay(delay).ConfigureAwait(false);
                await log.CommitAsync().ConfigureAwait(false);
            }
        }

        static async Task ProducerAsync(FasterLog log, string prefix, int numberOfIterations)
        {
            var i = 0;

            while (i < numberOfIterations)
            {
                try
                {
                    await log.EnqueueAsync(Encoding.UTF8.GetBytes(prefix + i.ToString())).ConfigureAwait(false);
                    await log.RefreshUncommittedAsync().ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Enqueue failed in Iteration {i}: {ex}");
                }
                finally
                {
                    i++;
                }
            }

            Console.WriteLine($"Producer {prefix} complete");
        }

        static async Task ConsumerAsync(FasterLogScanIterator iterator, FasterLog log, string name, int numberOfIterations)
        {
            var i = 0;

            while (i < numberOfIterations)
            {
                try
                {
                    byte[] result;
                    int length;
                    long nextAddress;
                    while (!iterator.GetNext(out result, out length, out _, out nextAddress))
                    {
                        // THIS WAITASYNC BELOW HANGS
                        if (!await iterator.WaitAsync().ConfigureAwait(false))
                        {
                            throw new InvalidOperationException("InMemoryQueueWithPersistence has been shutdown and cannot dequeue any more.");
                        }
                    }

                    iterator.CompleteUntil(nextAddress);

                    log.TruncateUntil(nextAddress);
                    Console.WriteLine($"Consumer {name} consumed: {Encoding.UTF8.GetString(result, 0, length)}");
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Dequeue failed in Iteration {i}: {ex}");
                }
                finally
                {
                    i++;
                }
            }

            Console.WriteLine($"Consumer {name} complete");
        }
    }
}

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
badrishccommented, Feb 1, 2022

Linked PR makes recovery throw an exception that is caught by upstream FasterLog recovery logic, which will in turn not recover to the specified commit. Thus, we will start with a clean unrecovered slate in this case, which is the best we can do. In v2, this exception can be caught by user by explicitly calling fasterlog.Recover() instead of setting logSettings.TryRecoverLatest. Then, the user can do any other custom repair.

0reactions
vangarpcommented, Feb 2, 2022

Much appreciated @badrishc . May I know when the next release is scheduled for? Would like this fix along with the you made recently which deletes old segments which are not in memory

Read more comments on GitHub >

github_iconTop Results From Across the Web

FasterLog Basics - FASTER
FasterLog is a blazing fast, persistent, concurrent, and recoverable log for C#. You can perform appends, commits, iteration, and log truncation ...
Read more >
Kafka vs. Redpanda performance – do the claims add up?
The solution seems to be fsync. It's what it's for. It's very appealing to wave it away because it's expensive. The situation above...
Read more >
Kafka 1.1 Documentation
Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This...
Read more >
Documentation - Apache Kafka
Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This...
Read more >
Commits · instana/mesosphere-fork-universe
The Mesosphere Universe package repository. Contribute to instana/mesosphere-fork-universe development by creating an account on GitHub.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found