Inspired by a different project, I set out to write a clone with less features, less polish, and less sophisticated parsing.
The result is a command line tool that you feed a url, and it barfs out all subdirectories and files listed, in no particular order, together with their corresponding sizes in bytes.
A bit like the unix tool du
, but for open directories.
$ iwicrawl --help
iwicrawl 0.1.0
iwikal <e.joel.nordstrom@gmail.com>
USAGE:
iwicrawl [FLAGS] <URL>
FLAGS:
-h, --help Prints help information
-q, --quiet Don't print anything except standard output
-V, --version Prints version information
-v, --verbose Increase message verbosity for each occurrance of the flag
ARGS:
<URL> The url of the directory to crawl
$ iwicrawl localhost
6 http://localhost/file2
327 http://localhost/file1
15 http://localhost/subdir/file3
878841856 http://localhost/subdir/file4
878841871 http://localhost/subdir/
878842204 http://localhost/
Finished in 16ms
It sends a HEAD request for each file listed, and I haven't implemented any throttling yet, so it might be a bit of a denial-of-service machine. Use responsibly.
I've compiled some binaries for windows and linux. I currently don't have any way to test OSX binaries.
- Install rust
- run
cargo install --git https://github.com/iwikal/iwicrawl.git