Wednesday, April 17, 2013

PageCrawler

So a couple of days ago I was browsing a forum and found a board with tons of links to computer science related eBooks. I found one link to a book called Security Engineering by Ross Anderson, and the webpage contained links to all of the chapters, each in a separate PDF file. I figured downloading each PDF by hand would be far too time consuming for anyone, so I developed a solution to this problem.



PageCrawler lets me download all of them at once. It reads a webpage, parses the HTML in order to find hyperlinks (specifically for hypertext references) and compiles a list of each based on the file types I'm looking for. Click "Save" and select the directory to save to. Voila!

I'm working on the "Extensions" button when I find some time between school, work, and homework. The "Extensions" button will parse the page and list all of the found file extensions in the "File Extensions" text box.

A friend of mine put forth a suggestion to grab files from a range of web pages, and so I'll implement that as well (in time).

I'll post to SourceForge this weekend!

No comments:

Post a Comment