RapidMiner Studio free Data Science Tool

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

It provides a wealth of functionality to speed & optimize data exploration, blending & cleansing tasks – reducing the time spent importing and wrangling your data. RapidMiner provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. RapidMiner Studio (Some information see item 18 of list). 

This programme keeps popping up when you look for Data Analysis tools. It is similar to Knime in that it uses nodes, but has a different interface.

Go here for downloads and pricing for free and other versions. Below is link to YouTube tutorials on RapidMiner Studio:

I have reviewed about 7 of the 19 tutorial videos and they are pretty good. Simple, straightforward and show you what you need to get started.

Connections

In the free version of RapidMiner Studio I read somewhere that you couldn’t connect to a database, or cloud AWS S3 or DropBox. I have just downloaded RapidMiner and it gives you a 30 day trial period for full tools for a while  then limits you to the 10,000 rows of data.  So far I have been able to link to my MS SQL database,  to DropBox and also a Twitter feed.  I see that you can get a 5GB free AWS S3 account for free so if I set that up I could connect to that too. An eclectic mixture of connections but all useful. Apparently there is even a NoSQL Mongo connection too.

Extensions

There are a few extensions you can upload from the marketplace (free) but not as many as in Knime (but I haven’t used most of the ones I’d downloaded from Knime).

There seem to be a few Text analytic tools and a Web Scraping tool I am interested in trying, as well as some connectors to financial data that may be worth exploring ( the Python tutorials I was using to learn python pandas used Quandl API for connecting to financial datasets).

Interface

Display/Process interface

The interface has 2 main areas, the DESIGN area and the RESULTS area.

In the DESIGN area you connect to your DATA (Left Top Side Panel) and then have OPERATORS (Left Bottom Panel or Bottom Middle Bar) that do actions on that DATA.

The processes can be from data cleaning to Machine Learning Operations. You “PLUG & PLAY” in that you connect the processes together.

You then need to RUN the PROCESS that you have developed, and if it is connected to an OUTPUT SOCKET on the Right of the process area then the output will be displayed in the RESULTS Panel.

The Top Right Panel shows the parameters that you may need to fill in of the Operator that you have currently selected and Bottom Right Panel is a brief explanation of that Operator and what it expects.

One thing to note. If there are Operators in the Process field area and they are not plugged in to anything it pops up with the fact that you need to do something about it.

In KNIME, it only activates the Processes you tell it to run, and you can do them one at a time or the whole series, thereby controlling the processing of the data (great for testing, and quick). RapidMiner seems to be like FME  in that it activates everything, so if you have some heavy processes it can take some time to run.

Results interface

After you have “RUN” your process it will take you to the Results tab that shows you the output of your process. This can be displayed in a number of ways, depending which tab you select in the left hand side bar.

The default one is a Tabulated page showing the output. You can select other methods of viewing the results as per the screenshots below.

In the Statistics tab it shows you information about your data, whether you have missing data and the ranges /values of the data. You can click on a particular column of data (showsn as a Row) and see more in-depth information on that particular aspect of the output. A good way to explore your data and output. You can use this interface on your original dataset to do some tidying if you have say Duplicates, or poorly named information in a particular column, so its useful for data cleaning and preparation.

You also have a wide variety of Charts to display and explore your data and outputs , both a simple Charts setup and also an Advanced Chart setup where you can tailor your chart to display the information you want to convey in your chosen manner. As you can see there are a lot of basic chart types to visualise your data.

Once you have chosen your chart type (Scatter, bar, pie etc) then you can choose, based on chart type, what data you want to display.

The Advanced chart setup is for tailoring your bespoke chart for export.

To export charts, go to FILE and choos Print/Export Images and this brings up the Print pop-up box

In the Print/Export pop-up choose output destination and type. For graphs you can export to PNG, JPEG, SVG & PDF.

If you want to get export data you can add a SAVE operator to your process that will output tabulated data to a CSV or EXCEL file (or write to Database).

End comment

I haven’t actually got a project that I am currently doing that requires either KNIME or RapidMiner, but the next project I have I will try RapidMiner out.

I like its interface and also the output tabs where you can quickly see your data in Statistical and Chart Views quickly. A very nice interface. That is one thing I find in Knime having to constantly click on nodes to see the results as you are setting up the process. This is so much easier. That being said, I really like KNIME which allows you to test a NODE at a time so you gradually build up what you want.

The other thing I like is that, from the videos I have watched, it does the pre-prep of data cleaning quite nicely too. A bit like OpenRefine but all in one programme. I look forward to using this on a project. 

Add a Comment

Your email address will not be published. Required fields are marked *