summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorKatolaZ <katolaz@freaknet.org>2018-08-01 10:27:54 +0100
committerKatolaZ <katolaz@freaknet.org>2018-08-01 10:27:54 +0100
commit11856792221c7faa2f423bc106d1f4b1482bcdb8 (patch)
tree10f5bb3a40ab184272efb31bfecc3bcc3da136e9 /README.md
parent040ba18f7fd17b4d6dc3a93549a19263fb0b8a95 (diff)
new url_to_id and added dry-run in burrow
Diffstat (limited to 'README.md')
-rw-r--r--README.md29
1 files changed, 29 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..be73adc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,29 @@
+## Burrow-The-Burrows
+
+A Gopher burrower in a shell script. By using `burrow` and a bit of
+plumbing you can get all the links in a Gopher MENU, recursively visit
+all the available subdirs, and create a directed graph of the visited
+selectors.
+
+`burrow` takes as input a gopher identifier, as generated by
+`url_to_id`, which is considered a gophermap, and provides on stdout the
+list of menu selectors found in that document. `burrow` will also dump
+on stderr the list of all the edges (to any kind of selector) found in
+that page, in the format:
+
+ src_SHA256 dst_SHA256
+
+where `src_SHA256` is the SHA256 of the source selector (the current
+document), while `dst_SHA256` is the destination selector (the pointed
+document).
+
+To start a crawl, one can do something like:
+
+```
+ $ ./url_to_id gopher://your.gopher.url/ > ids
+ $ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &
+```
+
+Notice that `burrow` will create a certain number of folders in the
+current directory, used to keep track of the selectors that have been
+already retrieved.