question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Porting ImageClassification to Jupyter

See original GitHub issue

ML.NET version: 1.4

I’m working to port this sample to ML.NET 1.4 and being able to use it in Jupyter Notebooks. Even though it works smoothly as VS solution, when trying to run it on a Jupyter I get the following error:

System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data

To do this, I’ve splitted the Program.cs file into different cells. In order to more easily handle the different dependencies (and to avoid filling the notebook with other declarations), I’ve also referenced the .dll created at /bin/Debug when running it on VS.

This is the code and the outputs of the notebook cells:

// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
#r "nuget:Microsoft.ML.ImageAnalytics"
#r "nuget:Microsoft.ML.Vision"
#r "nuget:SciSharp.Tensorflow.Redist"
#r "nuget:SharpZipLib"
    
#r "C:\Users\******\Desktop\DeepLearning_ImageClassification_Training\v_jupyter\ImageClassification.Train\bin\Debug\netcoreapp2.1\ImageClassification.Shared.dll"
#r "C:\Users\******\Desktop\DeepLearning_ImageClassification_Training\v_jupyter\ImageClassification.Train\bin\Debug\netcoreapp2.1\ImageClassification.Train.dll"

Installing package SharpZipLib… Installing package Microsoft.ML.ImageAnalytics… Installing package Microsoft.ML.Vision… Installing package Microsoft.ML, version 1.4… Installing package SciSharp.Tensorflow.Redist…

(by the way, in this one i’ve found no way to do the #r without the absolute path, despite what it’s said in dotnet/try#698 and the inner issue suggestions)

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using Common;
using ImageClassification;
using ImageClassification.DataModels;
using ImageClassification.Train;
using Microsoft.ML;
using Microsoft.ML.Transforms;
using Microsoft.ML.Vision;
using static Microsoft.ML.Transforms.ValueToKeyMappingEstimator;
private static void EvaluateModel(MLContext mlContext, IDataView testDataset, ITransformer trainedModel)
{
    Console.WriteLine("Making predictions in bulk for evaluating model's quality...");

    // Measuring time
    var watch = Stopwatch.StartNew();

    var predictionsDataView = trainedModel.Transform(testDataset);

    var metrics = mlContext.MulticlassClassification.Evaluate(predictionsDataView, labelColumnName:"LabelAsKey", predictedLabelColumnName: "PredictedLabel");
    ConsoleHelper.PrintMultiClassClassificationMetrics("TensorFlow DNN Transfer Learning", metrics);

    watch.Stop();
    var elapsed2Ms = watch.ElapsedMilliseconds;

    Console.WriteLine($"Predicting and Evaluation took: {elapsed2Ms / 1000} seconds");
}

private static void TrySinglePrediction(string imagesFolderPathForPredictions, MLContext mlContext, ITransformer trainedModel)
{
    // Create prediction function to try one prediction
    var predictionEngine = mlContext.Model
        .CreatePredictionEngine<InMemoryImageData, ImagePrediction>(trainedModel);

    var testImages = FileUtils.LoadInMemoryImagesFromDirectory(
        imagesFolderPathForPredictions, false);

    var imageToPredict = testImages.First();

    var prediction = predictionEngine.Predict(imageToPredict);

    Console.WriteLine(
        $"Image Filename : [{imageToPredict.ImageFileName}], " +
        $"Scores : [{string.Join(",", prediction.Score)}], " +
        $"Predicted Label : {prediction.PredictedLabel}");
}


public static IEnumerable<ImageData> LoadImagesFromDirectory(
    string folder,
    bool useFolderNameAsLabel = true)
    => FileUtils.LoadImagesFromDirectory(folder, useFolderNameAsLabel)
        .Select(x => new ImageData(x.imagePath, x.label));

public static string DownloadImageSet(string imagesDownloadFolder)
{
    // get a set of images to teach the network about the new classes

    //SINGLE SMALL FLOWERS IMAGESET (200 files)
    const string fileName = "flower_photos_small_set.zip";
    var url = $"https://mlnetfilestorage.file.core.windows.net/imagesets/flower_images/flower_photos_small_set.zip?st=2019-08-07T21%3A27%3A44Z&se=2030-08-08T21%3A27%3A00Z&sp=rl&sv=2018-03-28&sr=f&sig=SZ0UBX47pXD0F1rmrOM%2BfcwbPVob8hlgFtIlN89micM%3D";
    Web.Download(url, imagesDownloadFolder, fileName);
    Compress.UnZip(Path.Join(imagesDownloadFolder, fileName), imagesDownloadFolder);

    //SINGLE FULL FLOWERS IMAGESET (3,600 files)
    //string fileName = "flower_photos.tgz";
    //string url = $"http://download.tensorflow.org/example_images/{fileName}";
    //Web.Download(url, imagesDownloadFolder, fileName);
    //Compress.ExtractTGZ(Path.Join(imagesDownloadFolder, fileName), imagesDownloadFolder);

    return Path.GetFileNameWithoutExtension(fileName);
}

public static void ConsoleWriteImagePrediction(string ImagePath, string Label, string PredictedLabel, float Probability)
{
    var defaultForeground = Console.ForegroundColor;
    var labelColor = ConsoleColor.Magenta;
    var probColor = ConsoleColor.Blue;

    Console.Write("Image File: ");
    Console.ForegroundColor = labelColor;
    Console.Write($"{Path.GetFileName(ImagePath)}");
    Console.ForegroundColor = defaultForeground;
    Console.Write(" original labeled as ");
    Console.ForegroundColor = labelColor;
    Console.Write(Label);
    Console.ForegroundColor = defaultForeground;
    Console.Write(" predicted as ");
    Console.ForegroundColor = labelColor;
    Console.Write(PredictedLabel);
    Console.ForegroundColor = defaultForeground;
    Console.Write(" with score ");
    Console.ForegroundColor = probColor;
    Console.Write(Probability);
    Console.ForegroundColor = defaultForeground;
    Console.WriteLine("");
}

private static void FilterMLContextLog(object sender, LoggingEventArgs e)
{
    if (e.Message.StartsWith("[Source=ImageClassificationTrainer;"))
    {
        Console.WriteLine(e.Message);
    }
}
string outputMlNetModelFilePath = "ImageClassification.Train/assets/outputs/imageClassifer.zip";
string imagesFolderPathForPredictions = "ImageClassification.Train/assets/inputs/images/images-for-prediction/FlowersPredictions";

string imagesDownloadFolderPath = "ImageClassification.Train/assets/inputs/images/";
// 1. Download the image set and unzip
string finalImagesFolderName = DownloadImageSet(imagesDownloadFolderPath);
string fullImagesetFolderPath = Path.Combine(imagesDownloadFolderPath, finalImagesFolderName);

var mlContext = new MLContext(seed: 1);

// Specify MLContext Filter to only show feedback log/traces about ImageClassification
// This is not needed for feedback output if using the explicit MetricsCallback parameter
mlContext.Log += FilterMLContextLog;    

ImageClassification.Train/assets/inputs/images/flower_photos_small_set.zip already exists.

// 2. Load the initial full image-set into an IDataView and shuffle so it'll be better balanced
IEnumerable<ImageData> images = LoadImagesFromDirectory(folder: fullImagesetFolderPath, useFolderNameAsLabel: true);
IDataView fullImagesDataset = mlContext.Data.LoadFromEnumerable(images);
IDataView shuffledFullImageFilePathsDataset = mlContext.Data.ShuffleRows(fullImagesDataset);
// 3. Load Images with in-memory type within the IDataView and Transform Labels to Keys (Categorical)
IDataView shuffledFullImagesDataset = mlContext.Transforms.Conversion.
        MapValueToKey(outputColumnName: "LabelAsKey", inputColumnName: "Label", keyOrdinality: KeyOrdinality.ByValue)
    .Append(mlContext.Transforms.LoadRawImageBytes(
                                    outputColumnName: "Image",
                                    imageFolder: fullImagesetFolderPath,
                                    inputColumnName: "ImagePath"))
    .Fit(shuffledFullImageFilePathsDataset)
    .Transform(shuffledFullImageFilePathsDataset);
// 4. Split the data 80:20 into train and test sets, train and evaluate.
var trainTestData = mlContext.Data.TrainTestSplit(shuffledFullImagesDataset, testFraction: 0.2);
IDataView trainDataView = trainTestData.TrainSet;
IDataView testDataView = trainTestData.TestSet;
// 5. Define the model's training pipeline using DNN default values
//
var pipeline = mlContext.MulticlassClassification.Trainers
        .ImageClassification(featureColumnName: "Image",
                             labelColumnName: "LabelAsKey",
                             validationSet: testDataView)
    .Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName: "PredictedLabel",
                                                          inputColumnName: "PredictedLabel"));

[Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Channel started

// 5.1 (OPTIONAL) Define the model's training pipeline by using explicit hyper-parameters
//
//var options = new ImageClassificationTrainer.Options()
//{
//    FeatureColumnName = "Image",
//    LabelColumnName = "LabelAsKey",
//    // Just by changing/selecting InceptionV3/MobilenetV2/ResnetV250  
//    // you can try a different DNN architecture (TensorFlow pre-trained model). 
//    Arch = ImageClassificationTrainer.Architecture.MobilenetV2,
//    Epoch = 50,       //100
//    BatchSize = 10,
//    LearningRate = 0.01f,
//    MetricsCallback = (metrics) => Console.WriteLine(metrics),
//    ValidationSet = testDataView
//};

//var pipeline = mlContext.MulticlassClassification.Trainers.ImageClassification(options)
//        .Append(mlContext.Transforms.Conversion.MapKeyToValue(
//            outputColumnName: "PredictedLabel",
//            inputColumnName: "PredictedLabel"));
// 6. Train/create the ML model
Console.WriteLine("*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture ***");

// Measuring training time
var watch = Stopwatch.StartNew();

//Train
ITransformer trainedModel = pipeline.Fit(trainDataView);

watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;

Console.WriteLine($"Training with transfer learning took: {elapsedMs / 1000} seconds");

*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture *** [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel finished. Elapsed 00:00:00.0028785. [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel disposed Saver not created because there are no variables in the graph to restore

System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes) at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass5_0.<ConsolidateCore>b__1() at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.Dispose(Boolean disposing) at Microsoft.ML.DataViewRow.Dispose() at Microsoft.ML.Data.SynchronizedCursorBase.Dispose(Boolean disposing) at Microsoft.ML.DataViewRow.Dispose() at Microsoft.ML.Data.LinkedRootCursorBase.Dispose(Boolean disposing) at Microsoft.ML.DataViewRow.Dispose() at Microsoft.ML.Vision.ImageClassificationTrainer.CacheFeaturizedImagesToDisk(IDataView input, String labelColumnName, String imageColumnName, ImageProcessor imageProcessor, String inputTensorName, String outputTensorName, String cacheFilePath, Dataset dataset, Action1 metricsCallback) at Microsoft.ML.Vision.ImageClassificationTrainer.TrainModelCore(TrainContext trainContext) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Trainers.TrainerEstimatorBase2.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Submission#13.<<Initialize>>d__0.MoveNext() — End of stack trace from previous location where exception was thrown — at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray1 precedingExecutors, Func2 currentExecutor, StrongBox1 exceptionHolderOpt, Func2 catchExceptionOpt, CancellationToken cancellationToken)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jonsequiturcommented, Jan 24, 2020

Relative path support has been fixed.

1reaction
eerhardtcommented, Jan 7, 2020

@Julexpuru - I was able to get this reproduce on my machine. The key here is the inner exception, which the Jupyter notebook is not showing. @jonsequitur @colombod - is this something we should change? I will log a bug so we can talk about that separately.

On my machine the inner exception being thrown says:

Inner Exception 1:
DirectoryNotFoundException: Could not find a part of the path 'C:\Users\eerhardt\ImageClassification.Train\assets\inputs\images\flower_photos_small_set\ImageClassification.Train\assets\inputs\images\flower_photos_small_set\roses\295257304_de893fc94d.jpg'.

Notice that the sub-path ImageClassification.Train\assets\inputs\images\flower_photos_small_set appears twice in that path. However, on my disk the image exists at `C:\Users\eerhardt\ImageClassification.Train\assets\inputs\images\flower_photos_small_set\roses\295257304_de893fc94d.jpg’.

So your code is appending ImageClassification.Train\assets\inputs\images\flower_photos_small_set too many times.

I think that is because each ImageData item’s ImagePath is set to

ImagePath Label
ImageClassification.Train/assets/inputs/images/flower_photos_small_set\daisy\100080576_f52e8ee070_n.jpg daisy

But you are also passing ImageClassification.Train\assets\inputs\images\flower_photos_small_set as fullImagesetFolderPath into mlContext.Transforms.LoadRawImageBytes(..., imageFolder: fullImagesetFolderPath, ...).

Cutting one of those duplicate sub-paths out should fix the problem. I fixed it locally by passing string.Empty into LoadRawImageBytes(imageFolder)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Part 2: Image Classification using Features Extracted by ...
In Part 2 of this 4-article series, we will create a Jupyter notebook and download the Fruits360 dataset using Keras within the Jupyter...
Read more >
Create and deploy your first Image Classification model in ...
In today's post, we will talk about training an Image Classification model and deploy it as well. What is FastAI?
Read more >
Deploying an Image Classification Web App with Python
In this article, I will show you step-by-step on how to create your own simple web app for image classification using Python, Streamlit, ......
Read more >
Image Classification with Jupyter Notebooks and AutoML in ...
Image classification in C# using ML.NET -- Watch live at https://www.twitch.tv/alexslotte.
Read more >
Kickstart Your AI Journey With an Image Segmentation ...
Kickstart Your AI Journey With an Image Segmentation Jupyter Notebook from the NVIDIA NGC Catalog. 2.3K views · 2 years ago ...more ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found