You are correct in your assumption that Scrivener only downloads the initial page you tell it to. Going any further is a fairly involved problem, actually. It’s one thing to tell a downloader that you only want to download page links that refer back to the same area of a website, but that very well might not get everything you want. You can increase the tolerances for what you’ll download, but every time you do that the risk of getting way more than you intended will increase. For instance, it’s not even safe to assume “same website”; consider someone who downloads a blog from blogger.com. And just opening it wide up to any old link whatsoever will inevitably lead to you shutting down the program before it fills up every molecule of your hard drive trying to download a substantial chunk of the World Wide Web. That last word, it’s very appropriate.
So the solution is a plethora of options which must be constantly fiddled with. No two bulk downloads are the same, and each require a degree of playing around with level thresholds, type filters, total file max limits, interval masking, connexion multiplexing, and domain/directory limiters to get the solution just right. Whole programs are devoted to the problem, and they work with varying levels of effectiveness. Not everyone wants people to download their whole site (or even parts of it), and employ restrictions that these sorts of programs are supposed to honour, some let you ignore that at your own risk (these restriction files are often there for a good reason, as it is possible to get stuck in download loops with page “names” that change every time the program requests them). Nowadays it is becoming increasingly more difficult as web designers ditch years of good standards and build their sites out of a pile of dark magic that only works in the most mundane and mainstream of uses.
Some things to investigate:
- wget (a UNIX tool, but has some decent Mac OS X front-ends)
- Blue Crab
And if you already own DEVONthink Pro, it has a very basic deep downloader built-in that might work for you.
The next problem is what to do with the content once you download it. Many of these just dump several thousand files into a directory and let you sort it. Not, in any way imaginable, useful for Scrivener. Some of the more Mac savvy programs might give you options to store everything in a series of modified webarchive files, but they won’t link to one another in the way you desire, at least not in Scrivener. The best the downloader can do is rewrite every single URL it establishes a download link to, so that it points to the local copy instead of the original URL. Once you drag both files into Scrivener, their names will be changed and they’ll be relocated into the project file—broken links, even if Scrivener could handle one webpage linking to another imported webpage within Scrivener (it can’t, last time I checked).
Your best option might be to just leave the download directory intact, and link to it via a Reference in Scrivener, letting it load in a browser, which will navigate on your hard drive as if it were a remote website—until you reach the boundary of where you established the downloader limits.