Porting ImageClassification to Jupyter
See original GitHub issueML.NET version: 1.4
I’m working to port this sample to ML.NET 1.4 and being able to use it in Jupyter Notebooks. Even though it works smoothly as VS solution, when trying to run it on a Jupyter I get the following error:
System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
To do this, I’ve splitted the Program.cs file into different cells. In order to more easily handle the different dependencies (and to avoid filling the notebook with other declarations), I’ve also referenced the .dll created at /bin/Debug when running it on VS.
This is the code and the outputs of the notebook cells:
// ML.NET Nuget packages installation
#r "nuget:Microsoft.ML,1.4"
#r "nuget:Microsoft.ML.ImageAnalytics"
#r "nuget:Microsoft.ML.Vision"
#r "nuget:SciSharp.Tensorflow.Redist"
#r "nuget:SharpZipLib"
#r "C:\Users\******\Desktop\DeepLearning_ImageClassification_Training\v_jupyter\ImageClassification.Train\bin\Debug\netcoreapp2.1\ImageClassification.Shared.dll"
#r "C:\Users\******\Desktop\DeepLearning_ImageClassification_Training\v_jupyter\ImageClassification.Train\bin\Debug\netcoreapp2.1\ImageClassification.Train.dll"
Installing package SharpZipLib… Installing package Microsoft.ML.ImageAnalytics… Installing package Microsoft.ML.Vision… Installing package Microsoft.ML, version 1.4… Installing package SciSharp.Tensorflow.Redist…
(by the way, in this one i’ve found no way to do the #r without the absolute path, despite what it’s said in dotnet/try#698 and the inner issue suggestions)
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using Common;
using ImageClassification;
using ImageClassification.DataModels;
using ImageClassification.Train;
using Microsoft.ML;
using Microsoft.ML.Transforms;
using Microsoft.ML.Vision;
using static Microsoft.ML.Transforms.ValueToKeyMappingEstimator;
private static void EvaluateModel(MLContext mlContext, IDataView testDataset, ITransformer trainedModel)
{
Console.WriteLine("Making predictions in bulk for evaluating model's quality...");
// Measuring time
var watch = Stopwatch.StartNew();
var predictionsDataView = trainedModel.Transform(testDataset);
var metrics = mlContext.MulticlassClassification.Evaluate(predictionsDataView, labelColumnName:"LabelAsKey", predictedLabelColumnName: "PredictedLabel");
ConsoleHelper.PrintMultiClassClassificationMetrics("TensorFlow DNN Transfer Learning", metrics);
watch.Stop();
var elapsed2Ms = watch.ElapsedMilliseconds;
Console.WriteLine($"Predicting and Evaluation took: {elapsed2Ms / 1000} seconds");
}
private static void TrySinglePrediction(string imagesFolderPathForPredictions, MLContext mlContext, ITransformer trainedModel)
{
// Create prediction function to try one prediction
var predictionEngine = mlContext.Model
.CreatePredictionEngine<InMemoryImageData, ImagePrediction>(trainedModel);
var testImages = FileUtils.LoadInMemoryImagesFromDirectory(
imagesFolderPathForPredictions, false);
var imageToPredict = testImages.First();
var prediction = predictionEngine.Predict(imageToPredict);
Console.WriteLine(
$"Image Filename : [{imageToPredict.ImageFileName}], " +
$"Scores : [{string.Join(",", prediction.Score)}], " +
$"Predicted Label : {prediction.PredictedLabel}");
}
public static IEnumerable<ImageData> LoadImagesFromDirectory(
string folder,
bool useFolderNameAsLabel = true)
=> FileUtils.LoadImagesFromDirectory(folder, useFolderNameAsLabel)
.Select(x => new ImageData(x.imagePath, x.label));
public static string DownloadImageSet(string imagesDownloadFolder)
{
// get a set of images to teach the network about the new classes
//SINGLE SMALL FLOWERS IMAGESET (200 files)
const string fileName = "flower_photos_small_set.zip";
var url = $"https://mlnetfilestorage.file.core.windows.net/imagesets/flower_images/flower_photos_small_set.zip?st=2019-08-07T21%3A27%3A44Z&se=2030-08-08T21%3A27%3A00Z&sp=rl&sv=2018-03-28&sr=f&sig=SZ0UBX47pXD0F1rmrOM%2BfcwbPVob8hlgFtIlN89micM%3D";
Web.Download(url, imagesDownloadFolder, fileName);
Compress.UnZip(Path.Join(imagesDownloadFolder, fileName), imagesDownloadFolder);
//SINGLE FULL FLOWERS IMAGESET (3,600 files)
//string fileName = "flower_photos.tgz";
//string url = $"http://download.tensorflow.org/example_images/{fileName}";
//Web.Download(url, imagesDownloadFolder, fileName);
//Compress.ExtractTGZ(Path.Join(imagesDownloadFolder, fileName), imagesDownloadFolder);
return Path.GetFileNameWithoutExtension(fileName);
}
public static void ConsoleWriteImagePrediction(string ImagePath, string Label, string PredictedLabel, float Probability)
{
var defaultForeground = Console.ForegroundColor;
var labelColor = ConsoleColor.Magenta;
var probColor = ConsoleColor.Blue;
Console.Write("Image File: ");
Console.ForegroundColor = labelColor;
Console.Write($"{Path.GetFileName(ImagePath)}");
Console.ForegroundColor = defaultForeground;
Console.Write(" original labeled as ");
Console.ForegroundColor = labelColor;
Console.Write(Label);
Console.ForegroundColor = defaultForeground;
Console.Write(" predicted as ");
Console.ForegroundColor = labelColor;
Console.Write(PredictedLabel);
Console.ForegroundColor = defaultForeground;
Console.Write(" with score ");
Console.ForegroundColor = probColor;
Console.Write(Probability);
Console.ForegroundColor = defaultForeground;
Console.WriteLine("");
}
private static void FilterMLContextLog(object sender, LoggingEventArgs e)
{
if (e.Message.StartsWith("[Source=ImageClassificationTrainer;"))
{
Console.WriteLine(e.Message);
}
}
string outputMlNetModelFilePath = "ImageClassification.Train/assets/outputs/imageClassifer.zip";
string imagesFolderPathForPredictions = "ImageClassification.Train/assets/inputs/images/images-for-prediction/FlowersPredictions";
string imagesDownloadFolderPath = "ImageClassification.Train/assets/inputs/images/";
// 1. Download the image set and unzip
string finalImagesFolderName = DownloadImageSet(imagesDownloadFolderPath);
string fullImagesetFolderPath = Path.Combine(imagesDownloadFolderPath, finalImagesFolderName);
var mlContext = new MLContext(seed: 1);
// Specify MLContext Filter to only show feedback log/traces about ImageClassification
// This is not needed for feedback output if using the explicit MetricsCallback parameter
mlContext.Log += FilterMLContextLog;
ImageClassification.Train/assets/inputs/images/flower_photos_small_set.zip already exists.
// 2. Load the initial full image-set into an IDataView and shuffle so it'll be better balanced
IEnumerable<ImageData> images = LoadImagesFromDirectory(folder: fullImagesetFolderPath, useFolderNameAsLabel: true);
IDataView fullImagesDataset = mlContext.Data.LoadFromEnumerable(images);
IDataView shuffledFullImageFilePathsDataset = mlContext.Data.ShuffleRows(fullImagesDataset);
// 3. Load Images with in-memory type within the IDataView and Transform Labels to Keys (Categorical)
IDataView shuffledFullImagesDataset = mlContext.Transforms.Conversion.
MapValueToKey(outputColumnName: "LabelAsKey", inputColumnName: "Label", keyOrdinality: KeyOrdinality.ByValue)
.Append(mlContext.Transforms.LoadRawImageBytes(
outputColumnName: "Image",
imageFolder: fullImagesetFolderPath,
inputColumnName: "ImagePath"))
.Fit(shuffledFullImageFilePathsDataset)
.Transform(shuffledFullImageFilePathsDataset);
// 4. Split the data 80:20 into train and test sets, train and evaluate.
var trainTestData = mlContext.Data.TrainTestSplit(shuffledFullImagesDataset, testFraction: 0.2);
IDataView trainDataView = trainTestData.TrainSet;
IDataView testDataView = trainTestData.TestSet;
// 5. Define the model's training pipeline using DNN default values
//
var pipeline = mlContext.MulticlassClassification.Trainers
.ImageClassification(featureColumnName: "Image",
labelColumnName: "LabelAsKey",
validationSet: testDataView)
.Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName: "PredictedLabel",
inputColumnName: "PredictedLabel"));
[Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Channel started
// 5.1 (OPTIONAL) Define the model's training pipeline by using explicit hyper-parameters
//
//var options = new ImageClassificationTrainer.Options()
//{
// FeatureColumnName = "Image",
// LabelColumnName = "LabelAsKey",
// // Just by changing/selecting InceptionV3/MobilenetV2/ResnetV250
// // you can try a different DNN architecture (TensorFlow pre-trained model).
// Arch = ImageClassificationTrainer.Architecture.MobilenetV2,
// Epoch = 50, //100
// BatchSize = 10,
// LearningRate = 0.01f,
// MetricsCallback = (metrics) => Console.WriteLine(metrics),
// ValidationSet = testDataView
//};
//var pipeline = mlContext.MulticlassClassification.Trainers.ImageClassification(options)
// .Append(mlContext.Transforms.Conversion.MapKeyToValue(
// outputColumnName: "PredictedLabel",
// inputColumnName: "PredictedLabel"));
// 6. Train/create the ML model
Console.WriteLine("*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture ***");
// Measuring training time
var watch = Stopwatch.StartNew();
//Train
ITransformer trainedModel = pipeline.Fit(trainDataView);
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Training with transfer learning took: {elapsedMs / 1000} seconds");
*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture *** [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel finished. Elapsed 00:00:00.0028785. [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel disposed Saver not created because there are no variables in the graph to restore
System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data
at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass5_0.<ConsolidateCore>b__1()
at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.Dispose(Boolean disposing)
at Microsoft.ML.DataViewRow.Dispose()
at Microsoft.ML.Data.SynchronizedCursorBase.Dispose(Boolean disposing)
at Microsoft.ML.DataViewRow.Dispose()
at Microsoft.ML.Data.LinkedRootCursorBase.Dispose(Boolean disposing)
at Microsoft.ML.DataViewRow.Dispose()
at Microsoft.ML.Vision.ImageClassificationTrainer.CacheFeaturizedImagesToDisk(IDataView input, String labelColumnName, String imageColumnName, ImageProcessor imageProcessor, String inputTensorName, String outputTensorName, String cacheFilePath, Dataset dataset, Action1 metricsCallback) at Microsoft.ML.Vision.ImageClassificationTrainer.TrainModelCore(TrainContext trainContext) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.TrainerEstimatorBase2.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain
1.Fit(IDataView input)
at Submission#13.<<Initialize>>d__0.MoveNext()
— End of stack trace from previous location where exception was thrown —
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray1 precedingExecutors, Func
2 currentExecutor, StrongBox1 exceptionHolderOpt, Func
2 catchExceptionOpt, CancellationToken cancellationToken)
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
Relative path support has been fixed.
@Julexpuru - I was able to get this reproduce on my machine. The key here is the inner exception, which the Jupyter notebook is not showing. @jonsequitur @colombod - is this something we should change? I will log a bug so we can talk about that separately.
On my machine the inner exception being thrown says:
Notice that the sub-path
ImageClassification.Train\assets\inputs\images\flower_photos_small_set
appears twice in that path. However, on my disk the image exists at `C:\Users\eerhardt\ImageClassification.Train\assets\inputs\images\flower_photos_small_set\roses\295257304_de893fc94d.jpg’.So your code is appending
ImageClassification.Train\assets\inputs\images\flower_photos_small_set
too many times.I think that is because each
ImageData
item’sImagePath
is set toBut you are also passing
ImageClassification.Train\assets\inputs\images\flower_photos_small_set
asfullImagesetFolderPath
intomlContext.Transforms.LoadRawImageBytes(..., imageFolder: fullImagesetFolderPath, ...)
.Cutting one of those duplicate sub-paths out should fix the problem. I fixed it locally by passing
string.Empty
intoLoadRawImageBytes(imageFolder)