summaryrefslogtreecommitdiff
path: root/README.md
blob: be73adc196d15535cc5836fa0e671a0224833d9e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Burrow-The-Burrows

A Gopher burrower in a shell script. By using `burrow` and a bit of
plumbing you can get all the links in a Gopher MENU, recursively visit
all the available subdirs, and create a directed graph of the visited
selectors.

`burrow` takes as input a gopher identifier, as generated by
`url_to_id`, which is considered a gophermap, and provides on stdout the
list of menu selectors found in that document. `burrow` will also dump
on stderr the list of all the edges (to any kind of selector) found in
that page, in the format: 

	src_SHA256 dst_SHA256

where `src_SHA256` is the SHA256 of the source selector (the current
document), while `dst_SHA256` is the destination selector (the pointed
document).

To start a crawl, one can do something like:

```
	$ ./url_to_id  gopher://your.gopher.url/ > ids
	$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &
``` 

Notice that `burrow` will create a certain number of folders in the
current directory, used to keep track of the selectors that have been
already retrieved.