This post was instigated by the Weka tutorial on image processing. I worked through the video and then thought about testing the datasets using the different analytic tools to see which ones were easy to set up and use and also for reviewing results.
Weka Image Processing
In the Advanced Weka videos there was one on Image Processing, see below.
I followed the steps below.
Using the steps I got the same results as the video. In the confusion matrix at the bottom it shows how many of each were correctly interpreted.
Using the display output predictions the results panel shows the wrongly predicted images, but since this is using 10 fold cross validation process it will take some time to identify which actual images were incorrectly interpreted.
So resetting to Use Training Set instead of 10 fold cross validation the results are better. In fact, only one image was incorrectly identified, that of a butterrfly.
If we look at the results it shows that it was instant # 8 that was incorrectly identified so we can go and look at the file to see which image this was.
If you wanted to take it out of the dataset, you would need to go and edit the .arff file and delete this image before you could re-run the programme to see if you could get a better accuracy, then use it as part of a test set to see if it fared any better in that part of the process.
Using the Vehicles set which is split into 20 cars, 20 planes and 20 trains and using the TRAINING SET with J48 there is a 96.7% accuracy. Looking at the incorrectly assessed images Plane/Car (im 36) and Train/Plane (im 42) I cannot see why the pattern classification chose these as they seem pretty well middle of the road as to their class.
Knime Image Processing
The video below demonstrates where Knime seems to be putting its effort with Image Processing. More for Biology & Chemistry and cellular image processing.
There are a lot of example workflows for image processing (top left of image below) and some good image viewers and also a few nodes associated with image processing (bottom left of image below) . I have not found a simple image clustering workflow to date like the Orange one so have not been able to test the Image Training files to date for clustering.
I had an attempt at setting up a workflow but found I was spending a lot of time not getting too far. As this was a simple cross test using all the Analytic programmes to test I did not take it any further.
Orange Image Processing
I quickly connected Orange up as per the Orange video on the process with the Weka Butterfly/owl dataset ( then tried the weka vehicle dataset too) and found that it correctly interpreted all the 100 images and put them into 2 clusters (see below). It was easy to set up , easy to use , and easy to check image output. By far the easiest process to use.
With the vehicle set Orange also grouped them into 3 correct clusters of cars/planes & trains (see ID numbers on the right, 1-20, 21-40, 41 to 60).
RapidMiner Image Processing
In the video below, at the end it talks about image processing with Burgsys add-in that can be downloaded here. There is a setup readme file on how to install, also copy the Data file for processes and images attached to the processes.
After looking through the processes that come with the add-in package there does not seem to be one for image clustering like the Orange package. I had a wee play with the RapidMiner Image processes but did not get too far with them. So like Knime. I explored and did not get very far. I’m not sure where I would use this particular image processing workflow.
Knime and RapidMiner are a challenge to set-up for this comparative example and I didn’t actually get a working process for either of them.
Weka needs to have an ARFF file set up to read as well as the images. Also you have to interpret the output to try and identify which images were not being misinterpreted to see if there is a reasonable explanation why it is so.
Orange was easy to set-up, easy to run and easy to view the results.
The takeaway from this is go to Orange first if you are going to do some basic image clustering.
Room condition processing
I was wondering whether I could use Image Clustering to interpret different finishes conditions inside a room, such as wall finishes and when they would need to be repainting.
There would be a few challenges when training the model as for different wall colours and different lighting conditions. I also wondered if you could use panorama photos too.
This information would be apparent when the person takes the photograph. So perhaps another way is to use image metadata and put information into the image at the time the photo is taken, or shortly afterwards. Here is a link to an article on How to add metadata to an image in Windows. If you had a set of images you could have an excel file with a list of the image files, then have a comment classification where you could embed information into the image based on an image assessment. So Image x.jpg “comment metadata” walls condition 2 , floor condition 1, ceiling condition 3 etc.
This could then be extracted later with a Knime process for the data. Only workable if its easy to embed and extract data from the images. Might be worth an exercise. Actually, at the same time you could also put room code and other relevant data into the file metadata. Actually I did the exercise on metadata, see article here.