Web scraping is the process of extracting data, information or images from a website using an automated method. Think of it as fully automatic copy and paste.
We either write or use the app to navigate to the websites we want and make copies of the specific things we want from those websites. This is much more accurate than loading the entire website.
Like any other tool, web scraping can be used for good or ill. Some of the best reasons to clean up websites are to rank them on a search engine based on their content, compare prices, or monitor stock market information. You can even use it as a kind of research tool.
How do I scrape websites with Excel?
Believe it or not, Excel has been able to retrieve data from websites for a long time, at least since Excel 2003. It’s just that most people don’t think about website scraping, let alone use a web scraping program. spreadsheets to do the job. But it’s surprisingly easy and effective. Let’s learn how to do this by creating a set of Microsoft Office keyboard shortcuts.
Find the sites that you want to scrape
The first thing we’re going to do is find the specific web pages from which we want to get information. Let’s go to the source and look at https://support.office.com/ We’re going to use the search term “frequently used shortcuts”. We can make it more specific by using the name of a particular application like Outlook, Excel, Word, etc. It might be worth bookmarking the results page so that we can easily return to it.
Click the search result “Keyboard shortcuts in Excel for Windows.” Once on this page, find the list of Excel versions and click New Versions. We are now working with the latest and greatest.
We could go back to the search results page and open the results for all the other Office applications in their tabs and bookmark them. It’s a good idea even for this exercise. At this point, most people would settle for collecting Office shortcuts, but not us. We’re going to put them in Excel so we can do whatever we want with them whenever we want.
Open Excel and Scrape
Open Excel and create a new workbook. Save the workbook as Office shortcuts. If you have OneDrive, save it there for Autosave to work.
After saving the workbook, click the Data tab.
On the ribbon of the Data tab, click From Internet.
A wizard window from the Internet will open. This is where we put the web address or the URL of the website from which we want to clear data. Switch to your web browser and copy the URL.
Paste the URL into the URL field of the wizard from the Internet. We can use this in basic or advanced mode. The advanced mode gives us a lot more options to access data from the website. For this exercise, we only need the Basic Mode. Click OK.
Excel will try to connect to the website. This may take a few seconds. If this happens, we will see a progress window.
A navigator window will open and we will see a list of tables from the website on the left. By selecting one of them, we will see a preview of the table on the right. Let’s select a table of frequently used shortcuts.
We can click the Web View tab to see the actual website if we need to search for the table we want. When we find it, we can click on it and it will be selected for import.
Now we click on the Download button at the bottom of this window. We may choose other options that are more complex and beyond our first cleanup. Just know they are there. Excel’s web scraping capabilities are very powerful.
The web table will load into Excel in a few seconds. We will see the data on the left, where the number is 1 in the picture below. Number 2 indicates the request used to retrieve data from the website. When we have multiple queries in the workbook, this is where we select the one we need.
Note that the data arrives in the spreadsheet as an Excel table. It is already configured so that we can filter and sort the data.
We can repeat this process for all other web pages that have the Office shortcuts we need for Outlook, Word, Access, PowerPoint, and any other Office application.
Keep stolen data in Excel
As a bonus for you, we will learn how to keep the collected data up to date in Excel. This is a great way to demonstrate how powerful Excel is for collecting data. Even so, we only do the simplest parsing that Excel can do.
For this example, let’s use a stock information web page like https://www.cnbc.com/stocks/
Follow what we did before, copy and paste the new URL from the address bar.
You will be taken to the navigator window and see the tables available. Let’s select the main US stock indices.
After clearing the data, we will see the following table.
On the right we see a request for major US stock indices. Select it so that it is highlighted. Make sure we are on the Table Tools tab and in the Design area. Then click the down arrow under the Update heading. Then click “Connection Properties”.
In the “Query Properties” window on the “Usage” tab, we can control the updating of this information. We can set a specific period of time for updating or updating when we open the book next time, or for updating in the background, or any combination of these. Once we have selected what we need, click OK to close the window and continue.
This is it! Now you can track stock prices, sports scores, or any other data that changes frequently from an Excel spreadsheet. If you are good with Excel equations and functions, you can do almost anything you want with the data.
Maybe try to spot market trends, launch a fantasy sports pool at work, or maybe just keep an eye on the weather. Who knows? Your imagination and the data available on the Internet are the only limits.