So I googled for “how to get list of total indexed pages for a website on google” because I’ve just launched our new site and wanted to clean up a bunch of the old data and put 301 redirects in place to get in Google’s good graces.
I ran across an awesome post by Matthew Porter which describes perfectly what I want to accomplish.
Sweet! I follow the instructions, but when I enter my domain into the googledocs spreadsheet I get a “Unable to fetch URL”. Boo.
Seems as that google no longer allows the fetching of these URLS from within google docs. I’m guessing it because of the shit-bags that abuse my beloved inter-webs.
Just when I thought I was totally screwed, I thought of possible work-around (my simpleton/ADD brain comes in handy from time to time)
*note: if you have alot of URL’s this method is going to suck – but hey… it’s better than nothing, right?
Google won’t allow fetching URL’s from within googledocs as Porters tutorial suggests, but there is nothing preventing you from pulling the max SERP results then viewing the source code and uploading the page to your own website.
Here’s the process:
Open a browser and pasted the following code for all indexed pages in my site (google will only allow you to pull a max of 100 results) Yes, change the domain name to yours Sparky.
https://www.google.com/search?q=site:absolute0.net&num=100
I’ll get a result page like this:
- Next, view the the source code: Shortcut code on Windows is Ctrl+U
- Copy All the code
- Open up an html editor and create a new file: we’ll call it “google-serp-page-1.html”
- Paste the content of the code in that file, save it and put it up on your server.
- Go back to the google results page, scroll to the bottom and click the next page of results.
- Repeat steps 1-4 with the only change being the name of the file (so page 2 would be “google-serp-page-2.html” and so on…)
NOW we can following the rest of Porters’ genius post!
Open googledocs spread sheet and In the first cell (A1) paste the following code:
=IMPORTXML(“https://www.a0web.com/google-serp-page-1.html”, “//h3/a/@href”)
Hit return and BOOM! You’re day got just a bit brighter.
Scroll to the last entry and repeat changing the page name.
=IMPORTXML(“https://www.a0web.com/google-serp-page-2.html”, “//h3/a/@href”)
Happy scraping!