summaryrefslogtreecommitdiff
path: root/doc/fitmle.1
blob: 75a43d1c071466d526a82cc9ac3c6adaeb00dd8d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
.\" generated with Ronn/v0.7.3
.\" http://github.com/rtomayko/ronn/tree/0.7.3
.
.TH "FITMLE" "1" "September 2017" "www.complex-networks.net" "www.complex-networks.net"
.
.SH "NAME"
\fBfitmle\fR \- Fit a set of values with a power\-law distribution
.
.SH "SYNOPSIS"
\fBfitmle\fR \fIdata_in\fR [\fItol\fR [TEST [\fInum_test\fR]]]
.
.SH "DESCRIPTION"
\fBfitmle\fR fits the data points contained in the file \fIdata_in\fR with a power\-law function P(k) ~ k, using the Maximum\-Likelihood Estimator (MLE)\. In particular, \fBfitmle\fR finds the exponent \fBgamma\fR and the minimum of the values provided on input for which the power\-law behaviour holds\. The second (optional) argument \fItol\fR sets the acceptable statistical error on the estimate of the exponent\.
.
.P
If \fBTEST\fR is provided, the program associates a p\-value to the goodness of the fit, based on the Kolmogorov\-Smirnov statistics computed on \fInum_test\fR sampled distributions from the theoretical power\-law function\. If \fInum_test\fR is not provided, the test is based on 100 sampled distributions\.
.
.SH "PARAMETERS"
.
.TP
\fIdata_in\fR
Set of values to fit\. If is equal to \fB\-\fR (dash), read the set from STDIN\.
.
.TP
\fItol\fR
The acceptable statistical error on the estimation of the exponent\. If omitted, it is set to 0\.1\.
.
.TP
TEST
If the third parameter is \fBTEST\fR, the program computes an estimate of the p\-value associated to the best\-fit, based on \fInum_test\fR synthetic samples of the same size of the input set\.
.
.TP
\fInum_test\fR
Number of synthetic samples to use for the estimation of the p\-value of the best fit\.
.
.SH "OUTPUT"
If \fBfitmle\fR is given less than three parameters (i\.e\., if \fBTEST\fR is not specified), the output is a line in the format:
.
.IP "" 4
.
.nf

    gamma k_min ks
.
.fi
.
.IP "" 0
.
.P
where \fBgamma\fR is the estimate for the exponent, \fBk_min\fR is the smallest of the input values for which the power\-law behaviour holds, and \fBks\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\.
.
.P
If \fBTEST\fR is specified, the output line contains also the estimate of the p\-value of the best fit:
.
.IP "" 4
.
.nf

    gamma k_min ks p\-value
.
.fi
.
.IP "" 0
.
.P
where \fBp\-value\fR is based on \fInum_test\fR samples (or just 100, if \fInum_test\fR is not specified) of the same size of the input, obtained from the theoretical power\-law function computed as a best fit\.
.
.SH "EXAMPLES"
Let us assume that the file \fBAS\-20010316\.net_degs\fR contains the degree sequence of the data set \fBAS\-20010316\.net\fR (the graph of the Internet at the AS level in March 2001)\. The exponent of the best\-fit power\-law distribution can be obtained by using:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs
    Using discrete fit
    2\.06165 6 0\.031626 0\.17
    $
.
.fi
.
.IP "" 0
.
.P
where \fB2\.06165\fR is the estimated value of the exponent \fBgamma\fR, \fB6\fR is the minimum degree value for which the power\-law behaviour holds, and \fB0\.031626\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\. The program is also telling us that it decided to use the discrete fitting procedure, since all the values in \fBAS\-20010316\.net_degs\fR are integers\. The latter information is printed to STDERR\.
.
.P
It is possible to compute the p\-value of the estimate by running:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs 0\.1 TEST
    Using discrete fit
    2\.06165 6 0\.031626 0\.17
    $
.
.fi
.
.IP "" 0
.
.P
which provides a p\-value equal to 0\.17, meaning that 17% of the synthetic samples showed a value of the KS statistics larger than that of the best\-fit\. The estimation of the p\-value here is based on 100 synthetic samples, since \fInum_test\fR was not provided\. If we allow a slightly larger value of the statistical error on the estimate of the exponent \fBgamma\fR, we obtain different values of \fBgamma\fR and \fBk_min\fR, and a much higher p\-value:
.
.IP "" 4
.
.nf

    $ fitmle AS\-20010316\.net_degs 0\.15 TEST 1000
    Using discrete fit
    2\.0585 19 0\.0253754 0\.924
    $
.
.fi
.
.IP "" 0
.
.P
Notice that in this case, the p\-value of the estimate is equal to 0\.924, and is based on 1000 synthetic samples\.
.
.SH "SEE ALSO"
deg_seq(1), power_law(1)
.
.SH "REFERENCES"
.
.IP "\(bu" 4
A\. Clauset, C\. R\. Shalizi, and M\. E\. J\. Newman\. "Power\-law distributions in empirical data"\. SIAM Rev\. 51, (2007), 661\-703\.
.
.IP "\(bu" 4
V\. Latora, V\. Nicosia, G\. Russo, "Complex Networks: Principles, Methods and Applications", Chapter 5, Cambridge University Press (2017)
.
.IP "" 0
.
.SH "AUTHORS"
(c) Vincenzo \'KatolaZ\' Nicosia 2009\-2017 \fB<v\.nicosia@qmul\.ac\.uk>\fR\.