<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Shibumi Dojo &#187; wget</title>
	<atom:link href="http://www.shibumidojo.org/index.php/tag/wget/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.shibumidojo.org</link>
	<description></description>
	<lastBuildDate>Mon, 16 Jan 2012 07:48:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>wget and downloading the entire directory</title>
		<link>http://www.shibumidojo.org/index.php/2009/06/28/wget-and-downloading-the-enrite-directory/</link>
		<comments>http://www.shibumidojo.org/index.php/2009/06/28/wget-and-downloading-the-enrite-directory/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 14:44:02 +0000</pubDate>
		<dc:creator>CorpusCallosum</dc:creator>
				<category><![CDATA[GNU Linux]]></category>
		<category><![CDATA[wget]]></category>

		<guid isPermaLink="false">http://www.shibumidojo.org/?p=176</guid>
		<description><![CDATA[Sometimes I need to retrieve the whole remote directory. Normally, using graphical user interface is the simplest way to download the directory, however, sometimes we do not prefer to use or we have no chance to use graphical user interfaces like working with GNU Linux servers without any desktop environments. In that case, we can [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes I need to retrieve the whole remote directory. Normally, using graphical user interface is the simplest way to download the directory, however, sometimes we do not prefer to use or we have no chance to use graphical user interfaces like working with GNU Linux servers without any desktop environments. In that case, we can use <em><strong>wget</strong></em> to get the whole directory. <span id="more-176"></span>Most of you heard something about <em><strong>wget </strong></em>and now I will give you some specific information about it which helps you to download a directory from remote URL. I should thank you <span style="color: #800000;"><strong>&#8220;Andrea Ben Benini&#8221;</strong></span> for this useful tip.</p>
<blockquote><p><span style="color: #33cccc;">wget -r &#8211;level=0 -E &#8211;ignore-length -x -k -p -erobots=off -np -N http://www.shibumidojo.org/something/directory</span></p></blockquote>
<p><em><span style="text-decoration: underline;">Here are the options:</span></em></p>
<p><strong>-r</strong> : Recursive retrieving (important)<br />
<strong>&#8211;level=0</strong>: Specify recursion maximum depth level (0 for no limit), very important<br />
<strong>-E</strong>: append &#8220;.html&#8221; extension to every document declared as &#8220;<strong>application/html</strong>&#8221;<br />
useful when you deal with dirs (that are not dirs but index.html files)<br />
<strong>&#8211;ignore-lenght</strong>: Ignore &#8220;Content-length&#8221; http headers, sometimes useful when dealing with bugged CGI programs<br />
<strong>-x</strong>: Force dirs, create an hierarchy of directories even if one would not been created otherwise<br />
<strong>-k</strong>: here&#8217;s one of the most useful options, it converts remote links to local for best viewing<br />
<strong>-p</strong>: download ll the files that are necessary for proper display of the page<br />
(not so reliable when dealing with JS code but useful)<br />
<strong>-erobots=off</strong>: turn off http robots.txt usage<br />
<strong>-np</strong>: no parent, do not ascend to parent dir when retrieving recursively,<br />
one of the most useful function I&#8217;ve seen</p>
<ul class="related_post"></ul>]]></content:encoded>
			<wfw:commentRss>http://www.shibumidojo.org/index.php/2009/06/28/wget-and-downloading-the-enrite-directory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

