new url_to_id and added dry-run in burrow

author: KatolaZ <katolaz@freaknet.org> 2018-08-01 10:27:54 +0100
committer: KatolaZ <katolaz@freaknet.org> 2018-08-01 10:27:54 +0100
commit: 11856792221c7faa2f423bc106d1f4b1482bcdb8 (patch)
tree: 10f5bb3a40ab184272efb31bfecc3bcc3da136e9 /README.md
parent: 040ba18f7fd17b4d6dc3a93549a19263fb0b8a95 (diff)
1 files changed, 29 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..be73adc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,29 @@
+## Burrow-The-Burrows
+
+A Gopher burrower in a shell script. By using `burrow` and a bit of
+plumbing you can get all the links in a Gopher MENU, recursively visit
+all the available subdirs, and create a directed graph of the visited
+selectors.
+
+`burrow` takes as input a gopher identifier, as generated by
+`url_to_id`, which is considered a gophermap, and provides on stdout the
+list of menu selectors found in that document. `burrow` will also dump
+on stderr the list of all the edges (to any kind of selector) found in
+that page, in the format: 
+
+	src_SHA256 dst_SHA256
+
+where `src_SHA256` is the SHA256 of the source selector (the current
+document), while `dst_SHA256` is the destination selector (the pointed
+document).
+
+To start a crawl, one can do something like:
+
+```
+	$ ./url_to_id  gopher://your.gopher.url/ > ids
+	$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &
+``` 
+
+Notice that `burrow` will create a certain number of folders in the
+current directory, used to keep track of the selectors that have been
+already retrieved.
author	KatolaZ <katolaz@freaknet.org>	2018-08-01 10:27:54 +0100
committer	KatolaZ <katolaz@freaknet.org>	2018-08-01 10:27:54 +0100
commit	11856792221c7faa2f423bc106d1f4b1482bcdb8 (patch)
tree	10f5bb3a40ab184272efb31bfecc3bcc3da136e9 /README.md
parent	040ba18f7fd17b4d6dc3a93549a19263fb0b8a95 (diff)