Web Data 1. Capturing Web data via Google sheets and Displaying on Web Page

  •  
  • 2
  •  
  •  
  •  
  •  
  •  
  •  
    2
    Shares

The last few posts have been looking at data from the Web into Excel or using API’s. Then doing some graphics with the data.

Excel is a bit static. You have captured the data from the web and you have to send someone your workbook to view. They may not update the data, so it can be a bit static. Or you want to display the data on a web page- you can take an image and post it, but it’ll be static.

So we turn to API’s. That’s great if the people creating/sharing the data have an API for you to link into, and if it isn’t too expensive to use. Then you can display dynamic data on your web page.

But what if there is updating data on a web page that you want to extract to display on your web-page? Where there is no API to link into? Well, you could, if you wanted to display their data raw, use an iframe container and put their HTML link into that to display their data, with an acknowledgement of source and their approval.

Maybe though, you want to filter their data , and also display it in a different way? How do you do that?

After exploring API’s I came to the conclusion that there are fewer of them than I’d like. So I needed another method to capture some data. A lot of websites update their data regularly. For example a weather site. With MetOffice NZ a lot of the data is free, just bloody hard to access.

I walk my dog down at Lyall Bay. He much prefers low tide as we can bounce the ball that he chases on the hard sand.  So I thought, how do I get tide data around Wellington? Because of the shape of the harbour it has some weird tides, there is a marked difference from Kapiti, Wellington South Coast, Wellington City & the Wairarapa. So I thought, for an exercise, how do I capture tide data? Well, Met Office do not have an API . So I thought I would grab data off a website with tide data. It would be regularly updated (I presume daily) so the data is changing. How to capture it?

I had come across some articles that talked about Google Sheets updating data and then linking that as a source to a web-page, so I decided that I would give that process a go.

I had done a similar process with Excel, having an Excel Worksheet on OneDrive and having an embedded link to a web page. Here is one of the examples. Here is the article about the process. The data in that case was static. You could use Power Query to have an automatic update for Excel so thew data was refreshed at specific intervals (not sure if it’ll update on a call from a web page).

Anyway, I’ve played with Excel, so time to break something else, so Google Sheets.

A couple of videos I found useful for the =IMPORTXML & =IMPORTHTMLprocesses are:

IMPORTHTML, IMPORTXML Functions – Google Sheets Tutorial to Extract from Web Pages to Spreadsheets 2

XPath query, IMPORTXML & Google Sheets – Advanced Tutorial

Process

You’ll need a google account to get google docs where you can  ” Create and edit web-based documents, spreadsheets, and presentations. Store documents online and access them from any computer.

Next, find the web page that you want to get some data from. My first attempt, I went to the MetOffice Tides for wellington:

https://www.metservice.com/marine/tides/wellington

I tried grabbing the table using the “=IMPORTXML("https://en.wikipedia.org/wiki/Moon_landing", "//a/@href")" process but had little success with the IMPORTXML or the IMPORTHTML process with the Met Service, so I cut my losses and tried another site:

https://www.tide-forecast.com/locations/Wellington-New-Zealand/tides/latest

where I found some more tide data for wellington.

in the cell type =IMPORTHTML("https://www.tide-forecast.com/locations/Wellington-New-Zealand/tides/latest","table",1) and iot takes 3 attributes, the web page, a “Table’ or “List” and which one it is on the page.

This post has a really nifty way of querying the site, using your Browser Console Command Line to find how many tables there are on a web page:

If the website you are importing from includes multiple tables, open your Developer Tools and run the following code in the console: var i = 1; [].forEach.call(document.getElementsByTagName("table"), function(x) { console.log(i++, x); });. To find the index associated with the table you want to display, mouse over the results until the table you want to display is highlighted. The number shown in the results is the table number you’ll need to use.

So  I’m able to get the table data into the Google Sheet.

You can do a bit of processing by hiding columns and altering data types. Not sure about graphing in Google Docs, but you can massage your data :

Then create a sharable link:

https://docs.google.com/spreadsheets/d/1RDyMETi8NbFOWj_AQHRukBFNZkLUGtcQtLx9gkhQIgI/edit?usp=sharing

`There is this post on how to put that link into an iFrame, so that you are looking at the Spreadsheet.

I revised the Data by calling the Table from the Web on Sheet 2 with the raw data, then displaying the modified table on Sheet1. The iFrame opens on the first sheet by default.

I also brought in a chart of Tide Heights/Time/Date which is posted at the top of the article. I found that you can , with Google Sheets, share a chart directly and put it into its own iFrame. If you size the graph & iframe you can get rid of the scroll bars on the web page, which is definitely neater than in the embedded sheet.

Automatic Updating

I checked this site the next morning and it was still the day before. Then I realised that it was an international site so the date had not changed where the server was.

But I looked up how to do an automatic update and found this Geckoboard article (go down list in left sidebar and choose “How to Configure Automatic Updates”). I have pasted part of it below.

The ImportHTML function does not automatically update the table or list from the source, even if the data on the source web page changes. To address this, we’ve written a script you can adapt to update your imported data automatically, even when the Google Sheet is closed.

Step 1: Write a Script

  1. In a new Google Sheet, select the Tools menu and click Script Editor. A new Google Apps Script tab will open.
  2. Replace the default code with the following script:
    function getData() { 
      var sheetName = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("<sheet-name>"); 
      var queryString = Math.random(); 
      var cellFunction = '=IMPORTHTML("<url>?","table",<index>)';
    
        sheetName.getRange('<import-cell>').setValue(cellFunction); 
    }
  3. Replace <sheet-name> with the name of your sheet (the tab name, not the name of the file).
  4. Replace <url> with the URL of your web page, e.g. https://www.bloomberg.com/markets/currencies/cross-rates.
  5. Replace <index> with the table position on the page, e.g. 1.
  6. Replace <import-cell> with the cell where you want to put your import statement, e.g. A1.
  7. In the title field, add a name to your project.
  8. Next, select File, then choose Save

Met Service tables no access!!

I am particularly interested in the MetService (NZ) tide data around Wellington as the High/Low tides times vary quite a lot around the coast and the MetService has several points of data collection.

Unfortunately, Google Sheets, nor Excel from Web worked on this site. I also used the call in the Firefox Console  (see below) that tells of the table existing (Table 1).

And looking between the tags, the data is there (developers console Inspector Tab below).

In Excel though, if you look at the Web Viw tab in the Navigator, there is no table shown at all! The only odd thing is the yellow warning that the page uses Internet Explorer’s compatibility mode.

In Postman, all you get is the web page and the table headers.

Outwit, the web scraping tool used in this article seems to show separate lines of data, but in the right hand panel you can display that data as SQL, HTML, XML or JSON, so I’m not actually sure how the information is brought into the page.

Now, I could get the export into SQL or CSV and refresh the data every 50 lines but that is not a dynamic process, there is a manual downloading of files in OutWit.

I’m not too sure what other resources I can throw at this. It seems that not all tables are made equal and some you can access, and other not. This will have to remain a mystery for the time being. I definitely cannot get data from it dynamically.

End comment

This process could have been done in Excel (I need to check that) and I wonder if power BI would do it for the free version (not sure if free, see post).

Overall, a good way to get some data from the Web to display in altered format. In Excel, you could definitely do some graph work. Maybe the next post should be on Graphics from both Google Sheets & Excel for posting Graph on Web Page via an iframe with an update capability.

The Data comes from the Web. It is brought into Google Sheets or Excel. At this point it needs to be transformed via some processing , data cleaning and visualisation, then via the iframe , the results are presented on the web-page. So we are using Google Sheets or Excel as the processor for manipulation of raw data into the portrayal of the results.

I need to ensure that the transformation is robust. Automatic updating of data should not break the process.I need to test it.

Also the iframe comes with bottom and side sliders which do not look very pretty.

I have not found anything about only being able to choose specific cells on a sheet to share with a web page in google sheets. Maybe you can only share the whole sheet.

One Comment

Add a Comment

Your email address will not be published. Required fields are marked *