Multi-domain and Explainable Prediction of Changes in Web Vocabularies

Supplemental paper materials

This page contains additional data and analysis supporting the paper:

Characteristics of input datasets

The following table describes the 139 datasets evalauted in the paper. Each dataset consists of exactly one version chain. One version chain consists of a varying number of versions (at least 3). Datasets 1 and 2 are the CEDAR and DBpedia datasets. Datasets 3-136 are the LOV datasets (i.e. retrieved via the Linked Open Vocabularies API). Datasets 137-139 are the SPARQL datasets (i.e. version chains reconstructed after querying the 637 public SPARQL endpoints in datahub.io). Each dataset is described by the following characteristics/features:

totalSize nSnapshots avgGap avgSize nInserts nDeletes nComm ratioInserts ratioDeletes ratioComm maxTreeDepth avgTreeDepth totalInstances ratioInstances ratioInstancesVSIS totalStructural ratioStructural ratioStructuralVSIS
1 19639805 8 4223.57 2454975.62 17420246 19043105 2052 0.48 0.52 0.00 3 3.00 2173259 0.11 1.00 3585 0.00 0.00
2 50084545 5 278.75 10016909.00 15269985 4862044 29313340 0.31 0.10 0.59 2 2.00 50049049 1.00 1.00 2049 0.00 0.00
3 736 4 181.33 184.00 233 367 222 0.28 0.45 0.27 1 1.00 121 0.16 0.91 12 0.02 0.09
4 776 3 67.00 258.67 384 250 221 0.45 0.29 0.26 1 1.00 110 0.14 0.75 37 0.05 0.25
5 3281 3 99.50 1093.67 1389 1861 631 0.36 0.48 0.16 1 1.00 1093 0.33 0.64 603 0.18 0.36
6 1395 3 1607.00 465.00 444 660 344 0.31 0.46 0.24 3 3.00 296 0.21 0.80 72 0.05 0.20
7 1853 3 61.50 617.67 397 470 817 0.24 0.28 0.49 1 1.67 230 0.12 0.73 86 0.05 0.27
8 1108 5 203.00 221.60 344 274 605 0.28 0.22 0.49 1 1.00 190 0.17 0.67 94 0.08 0.33
9 518 4 256.67 129.50 82 39 328 0.18 0.09 0.73 1 1.00 94 0.18 0.84 18 0.03 0.16
10 351 3 385.50 117.00 53 30 195 0.19 0.11 0.70 2 1.67 66 0.19 0.61 42 0.12 0.39
11 543 4 256.67 135.75 82 37 345 0.18 0.08 0.74 2 2.00 93 0.17 0.79 24 0.04 0.21
12 330 3 243.50 110.00 37 22 191 0.15 0.09 0.76 2 2.00 53 0.16 0.78 15 0.05 0.22
13 6896 4 104.33 1724.00 1369 1359 3903 0.21 0.20 0.59 5 5.00 935 0.14 0.67 453 0.07 0.33
14 4266 3 259.50 1422.00 1398 1742 1310 0.31 0.39 0.29 3 2.33 844 0.20 0.76 264 0.06 0.24
15 2371 7 502.00 338.71 1432 513 895 0.50 0.18 0.32 3 1.57 339 0.14 0.76 106 0.04 0.24
16 5637 1 298.00 5637.00 7 8.00 1200 0.21 0.72 470 0.08 0.28
17 387 3 368.50 129.00 117 136 130 0.31 0.36 0.34 2 2.00 119 0.31 0.69 54 0.14 0.31
18 1027 4 449.00 256.75 338 457 378 0.29 0.39 0.32 2 2.00 293 0.29 0.83 58 0.06 0.17
19 744 3 789.00 248.00 69 80 428 0.12 0.14 0.74 1 1.00 158 0.21 0.96 6 0.01 0.04
20 139 3 1177.50 46.33 27 30 67 0.22 0.24 0.54 1 1.33 12 0.09 0.36 21 0.15 0.64
21 5455 12 137.82 454.58 709 1196 3768 0.12 0.21 0.66 3 0.25 1237 0.23 0.93 94 0.02 0.07
22 547 3 486.50 182.33 329 438 2 0.43 0.57 0.00 1 1.00 143 0.26 0.66 73 0.13 0.34
23 499 6 329.40 83.17 74 69 356 0.15 0.14 0.71 1 1.00 118 0.24 0.84 23 0.05 0.16
24 3134 4 126.00 783.50 119 142 2241 0.05 0.06 0.90 4 2.75 1243 0.40 0.70 528 0.17 0.30
25 672 5 100.60 134.40 174 104 387 0.26 0.16 0.58 2 1.80 174 0.26 0.81 41 0.06 0.19
26 303 5 226.25 60.60 47 43 201 0.16 0.15 0.69 1 1.00 79 0.26 0.83 16 0.05 0.17
27 10017 6 615.20 1669.50 6986 7500 1297 0.44 0.48 0.08 4 3.33 2395 0.24 0.55 1993 0.20 0.45
28 1796 9 352.38 199.56 398 138 1330 0.21 0.07 0.71 2 1.11 510 0.28 0.94 34 0.02 0.06
29 59 3 806.00 19.67 17 16 25 0.29 0.28 0.43 0 0.00 6 0.10 1.00 0 0.00 0.00
30 2092 6 153.20 348.67 534 221 1450 0.24 0.10 0.66 1 1.00 258 0.12 0.91 24 0.01 0.09
31 418 3 806.00 139.33 97 98 183 0.26 0.26 0.48 0 0.00 45 0.11 1.00 0 0.00 0.00
32 2263 9 130.67 251.44 891 579 1349 0.32 0.21 0.48 3 1.44 809 0.36 0.90 93 0.04 0.10
33 2592 3 806.00 864.00 460 451 1276 0.21 0.21 0.58 1 1.00 296 0.11 0.92 27 0.01 0.08
34 343 3 806.00 114.33 67 68 163 0.22 0.23 0.55 1 1.00 36 0.10 0.86 6 0.02 0.14
35 1697 3 173.00 565.67 380 513 668 0.24 0.33 0.43 1 1.00 381 0.22 0.95 21 0.01 0.05
36 87839 7 42.43 12548.43 41513 45645 30183 0.35 0.39 0.26 6 4.29 18942 0.22 0.50 18972 0.22 0.50
37 1414 3 335.50 471.33 353 481 505 0.26 0.36 0.38 2 2.00 273 0.19 0.58 195 0.14 0.42
38 707 6 254.00 117.83 396 312 250 0.41 0.33 0.26 1 1.00 99 0.14 0.94 6 0.01 0.06
39 11736 6 370.80 1956.00 3601 3710 6168 0.27 0.28 0.46 4 3.33 2139 0.18 0.64 1180 0.10 0.36
40 2051 7 611.33 293.00 1902 1812 82 0.50 0.48 0.02 1 0.86 460 0.22 0.83 96 0.05 0.17
41 17470 7 118.00 2495.71 6144 5584 9006 0.30 0.27 0.43 4 3.29 3444 0.20 0.81 792 0.05 0.19
42 1710 4 470.00 427.50 684 614 632 0.35 0.32 0.33 3 3.00 306 0.18 0.86 50 0.03 0.14
43 658 3 126.00 219.33 73 30 396 0.15 0.06 0.79 1 1.00 103 0.16 0.79 27 0.04 0.21
44 1359 4 287.33 339.75 121 116 909 0.11 0.10 0.79 1 1.00 298 0.22 0.96 12 0.01 0.04
45 800 3 335.50 266.67 178 274 296 0.24 0.37 0.40 1 1.00 161 0.20 0.62 98 0.12 0.38
46 29393 15 91.79 1959.53 8383 8097 19313 0.23 0.23 0.54 5 3.67 7359 0.25 0.62 4601 0.16 0.38
47 5624 10 356.22 562.40 276 111 4889 0.05 0.02 0.93 2 2.00 1440 0.26 0.91 137 0.02 0.09
48 1393 6 61.00 232.17 251 24 1023 0.19 0.02 0.79 2 1.83 500 0.36 0.95 24 0.02 0.05
49 870 3 82.00 290.00 195 267 341 0.24 0.33 0.42 2 2.00 166 0.19 0.82 37 0.04 0.18
50 1561 4 55.00 390.25 444 576 634 0.27 0.35 0.38 3 3.25 301 0.19 0.69 135 0.09 0.31
51 6670 3 691.50 2223.33 641 641 3863 0.12 0.12 0.75 2 2.33 2573 0.39 0.92 216 0.03 0.08
52 616 3 347.00 205.33 166 221 210 0.28 0.37 0.35 3 2.00 98 0.16 0.60 66 0.11 0.40
53 45667 12 274.45 3805.58 7632 3445 35386 0.16 0.07 0.76 2 1.25 8716 0.19 0.99 120 0.00 0.01
54 15959 4 231.67 3989.75 5526 6665 5588 0.31 0.37 0.31 3 2.75 2008 0.13 0.90 225 0.01 0.10
55 8803 3 365.50 2934.33 655 646 5224 0.10 0.10 0.80 4 5.00 1767 0.20 0.54 1485 0.17 0.46
56 253 3 475.50 84.33 36 56 123 0.17 0.26 0.57 1 0.33 44 0.17 0.83 9 0.04 0.17
57 164 3 661.00 54.67 92 39 54 0.50 0.21 0.29 1 0.67 35 0.21 0.95 2 0.01 0.05
58 934 3 120.00 311.33 78 54 557 0.11 0.08 0.81 3 2.33 144 0.15 0.71 59 0.06 0.29
59 1997 3 82.50 665.67 453 620 768 0.25 0.34 0.42 2 2.00 406 0.20 0.84 79 0.04 0.16
60 3552 5 39.00 710.40 384 526 2318 0.12 0.16 0.72 5 2.80 648 0.18 0.70 273 0.08 0.30
61 376 3 376.50 125.33 68 45 200 0.22 0.14 0.64 1 1.00 89 0.24 0.93 7 0.02 0.07
62 367 3 337.50 122.33 190 82 106 0.50 0.22 0.28 3 2.00 58 0.16 0.79 15 0.04 0.21
63 3397 4 302.00 849.25 1730 1299 1139 0.42 0.31 0.27 2 1.50 663 0.20 0.76 212 0.06 0.24
64 17391 6 376.80 2898.50 10527 5808 6527 0.46 0.25 0.29 4 1.83 4913 0.28 0.85 843 0.05 0.15
65 74610 4 402.33 18652.50 29449 17608 36442 0.35 0.21 0.44 1 1.25 7242 0.10 0.59 4944 0.07 0.41
66 691 3 375.00 230.33 348 321 168 0.42 0.38 0.20 0 0.00 143 0.21 0.63 85 0.12 0.37
67 6206 13 205.38 477.38 1084 338 5021 0.17 0.05 0.78 1 0.85 563 0.09 0.95 27 0.00 0.05
68 1818 4 112.50 454.50 951 1104 411 0.39 0.45 0.17 3 2.00 473 0.26 0.78 134 0.07 0.22
69 406 3 485.00 135.33 61 39 226 0.19 0.12 0.69 1 1.00 79 0.19 0.73 29 0.07 0.27
70 1012 16 57.60 63.25 259 143 746 0.23 0.12 0.65 1 0.31 146 0.14 0.92 13 0.01 0.08
71 782 4 272.33 195.50 41 16 556 0.07 0.03 0.91 1 1.00 151 0.19 0.97 4 0.01 0.03
72 570 4 481.33 142.50 273 190 213 0.40 0.28 0.32 2 1.50 163 0.29 0.95 8 0.01 0.05
73 11308 4 254.67 2827.00 5859 5434 3399 0.40 0.37 0.23 2 2.00 2418 0.21 0.61 1551 0.14 0.39
74 2211 3 168.00 737.00 406 403 1071 0.22 0.21 0.57 2 2.33 577 0.26 0.73 209 0.09 0.27
75 42866 5 239.25 8573.20 8853 8591 26074 0.20 0.20 0.60 0 0.00 3718 0.09 1.00 0 0.00 0.00
76 21094 7 153.33 3013.43 4466 3962 13663 0.20 0.18 0.62 0 0.00 7327 0.35 1.00 0 0.00 0.00
77 309 4 589.00 77.25 140 87 120 0.40 0.25 0.35 1 1.00 72 0.23 0.91 7 0.02 0.09
78 3008 3 51.00 1002.67 1134 720 1149 0.38 0.24 0.38 5 4.67 509 0.17 0.65 277 0.09 0.35
79 2802 3 1565.00 934.00 1363 765 951 0.44 0.25 0.31 3 3.00 619 0.22 0.61 388 0.14 0.39
80 4062 10 52.44 406.20 359 265 3337 0.09 0.07 0.84 2 2.20 799 0.20 0.83 159 0.04 0.17
81 320 2 192.00 160.00 19 9 147 0.11 0.05 0.84 1 1.00 43 0.13 0.91 4 0.01 0.09
82 1318 3 174.50 439.33 1011 757 2 0.57 0.43 0.00 2 2.00 137 0.10 0.64 76 0.06 0.36
83 5244 4 230.00 1311.00 2754 2365 1598 0.41 0.35 0.24 1 1.00 873 0.17 0.84 164 0.03 0.16
84 162 3 6.50 54.00 97 57 32 0.52 0.31 0.17 0 0.00 41 0.25 1.00 0 0.00 0.00
85 394 3 6.00 131.33 84 135 159 0.22 0.36 0.42 1 1.00 105 0.27 0.75 35 0.09 0.25
86 458 4 184.33 114.50 22 20 326 0.06 0.05 0.89 1 1.00 124 0.27 0.97 4 0.01 0.03
87 529 3 81.50 176.33 110 136 223 0.23 0.29 0.48 3 2.00 82 0.16 0.59 56 0.11 0.41
88 858 4 339.00 214.50 122 95 533 0.16 0.13 0.71 2 1.50 234 0.27 0.84 46 0.05 0.16
89 5202 10 156.11 520.20 891 453 4009 0.17 0.08 0.75 2 2.40 932 0.18 0.92 81 0.02 0.08
90 1403 3 339.00 467.67 42 14 910 0.04 0.01 0.94 0 0.00 237 0.17 1.00 0 0.00 0.00
91 3651 3 497.00 1217.00 2197 2607 199 0.44 0.52 0.04 3 3.33 992 0.27 0.92 85 0.02 0.08
92 1820 8 293.00 227.50 484 226 1255 0.25 0.12 0.64 0 0.00 328 0.18 1.00 0 0.00 0.00
93 338 3 107.00 112.67 54 68 164 0.19 0.24 0.57 1 1.00 57 0.17 0.90 6 0.02 0.10
94 1405 3 229.50 468.33 399 148 627 0.34 0.13 0.53 2 2.00 220 0.16 0.76 68 0.05 0.24
95 521 5 95.75 104.20 119 134 289 0.22 0.25 0.53 1 1.00 122 0.23 0.88 16 0.03 0.12
96 3078 11 112.50 279.82 947 781 2013 0.25 0.21 0.54 2 1.64 1143 0.37 0.90 127 0.04 0.10
97 7523 5 245.75 1504.60 2720 1983 3881 0.32 0.23 0.45 2 2.60 845 0.11 0.78 235 0.03 0.22
98 2732 8 280.57 341.50 936 887 1554 0.28 0.26 0.46 3 2.50 703 0.26 0.79 188 0.07 0.21
99 2050 8 170.86 256.25 663 487 1259 0.28 0.20 0.52 2 1.25 787 0.38 0.90 90 0.04 0.10
100 628 3 619.00 209.33 219 210 205 0.35 0.33 0.32 1 0.33 205 0.33 0.93 16 0.03 0.07
101 1033 4 447.00 258.25 96 64 706 0.11 0.07 0.82 3 2.25 280 0.27 0.84 53 0.05 0.16
102 8903 4 235.33 2225.75 188 186 6492 0.03 0.03 0.95 0 0.00 1004 0.11 1.00 0 0.00 0.00
103 520 3 290.00 173.33 48 16 314 0.13 0.04 0.83 0 0.00 137 0.26 1.00 0 0.00 0.00
104 1098 3 277.50 366.00 309 513 289 0.28 0.46 0.26 1 1.33 261 0.24 0.72 101 0.09 0.28
105 1058 3 196.50 352.67 654 325 263 0.53 0.26 0.21 2 1.33 292 0.28 0.77 88 0.08 0.23
106 1266 3 382.00 422.00 334 460 429 0.27 0.38 0.35 2 2.33 216 0.17 0.59 151 0.12 0.41
107 4320 2 149.00 2160.00 1427 2893 1 0.33 0.67 0.00 6 3.50 1185 0.27 0.85 212 0.05 0.15
108 129601 20 50.32 6480.05 7417 1885 119431 0.06 0.01 0.93 3 2.50 27090 0.21 0.71 11088 0.09 0.29
109 420 4 790.33 105.00 111 50 255 0.27 0.12 0.61 2 1.75 116 0.28 0.80 29 0.07 0.20
110 17608 3 47.50 5869.33 7950 11915 1147 0.38 0.57 0.05 2 1.33 3948 0.22 0.98 82 0.00 0.02
111 108767 7 67.00 15538.14 27445 32918 61517 0.23 0.27 0.50 9 5.29 17655 0.16 0.57 13091 0.12 0.43
112 129 3 1020.00 43.00 29 25 63 0.25 0.21 0.54 0 0.00 46 0.36 0.90 5 0.04 0.10
113 379 3 92.50 126.33 10 9 245 0.04 0.03 0.93 1 1.00 87 0.23 0.96 4 0.01 0.04
114 932 3 641.00 310.67 169 93 490 0.22 0.12 0.65 3 2.67 191 0.20 0.69 85 0.09 0.31
115 1779 4 435.33 444.75 536 352 932 0.29 0.19 0.51 3 2.25 380 0.21 0.83 77 0.04 0.17
116 2216 5 156.00 443.20 337 172 1522 0.17 0.08 0.75 2 2.20 569 0.26 0.74 198 0.09 0.26
117 4644 3 57.50 1548.00 1240 1476 1668 0.28 0.34 0.38 5 2.67 1052 0.23 0.51 1006 0.22 0.49
118 2435 3 36.00 811.67 355 271 1302 0.18 0.14 0.68 4 4.00 547 0.22 0.66 288 0.12 0.34
119 296 3 1653.50 98.67 26 26 176 0.11 0.11 0.77 1 1.00 53 0.18 0.79 14 0.05 0.21
120 1806 5 213.25 361.20 84 27 1387 0.06 0.02 0.93 4 2.40 493 0.27 0.71 204 0.11 0.29
121 1215 6 225.40 202.50 180 106 880 0.15 0.09 0.75 0 0.00 221 0.18 1.00 0 0.00 0.00
122 820 5 174.50 164.00 362 340 322 0.35 0.33 0.31 1 1.00 128 0.16 0.64 73 0.09 0.36
123 166 3 1007.00 55.33 38 37 78 0.25 0.24 0.51 0 0.00 51 0.31 0.84 10 0.06 0.16
124 210 3 1025.50 70.00 56 36 102 0.29 0.19 0.53 1 1.00 69 0.33 0.84 13 0.06 0.16
125 1273 5 289.25 254.60 140 124 862 0.12 0.11 0.77 0 0.00 271 0.21 1.00 0 0.00 0.00
126 2309 4 249.67 577.25 1305 1274 484 0.43 0.42 0.16 4 3.75 464 0.20 0.60 312 0.14 0.40
127 7478 3 82.50 2492.67 1308 1712 3485 0.20 0.26 0.54 2 1.00 1636 0.22 0.92 142 0.02 0.08
128 12771 8 51.43 1596.38 259 165 10954 0.02 0.01 0.96 2 2.00 3467 0.27 0.95 179 0.01 0.05
129 14625 8 331.29 1828.12 12553 10984 1945 0.49 0.43 0.08 1 1.00 943 0.06 0.80 237 0.02 0.20
130 5650 2 415.00 2825.00 1443 1443 1383 0.34 0.34 0.32 2 2.00 1406 0.25 0.79 382 0.07 0.21
131 2266 5 284.75 453.20 1134 908 797 0.40 0.32 0.28 3 1.20 476 0.21 0.74 171 0.08 0.26
132 4505 6 967.20 750.83 3926 3194 445 0.52 0.42 0.06 2 1.17 1093 0.24 0.80 281 0.06 0.20
133 11196 4 1434.67 2799.00 7467 8779 580 0.44 0.52 0.03 3 2.75 2559 0.23 0.75 839 0.07 0.25
134 56130 10 186.11 5613.00 26124 24157 25200 0.35 0.32 0.33 6 4.10 12120 0.22 0.66 6322 0.11 0.34
135 1662 7 233.50 237.43 493 300 1046 0.27 0.16 0.57 1 1.00 269 0.16 0.96 12 0.01 0.04
136 859 3 406.00 286.33 298 296 280 0.34 0.34 0.32 1 1.00 238 0.28 0.99 3 0.00 0.01
137 22404 3 7468.00 22347 22345 2 0.50 0.50 0.00 2 1.00 10789 0.48 1.00 3 0.00 0.00
138 16505 3 5501.67 196 4147 9464 0.01 0.30 0.69 2 1.67 2259 0.14 0.98 37 0.00 0.02
139 1431 5 286.20 196 951 380 0.13 0.62 0.25 1 0.60 196 0.14 0.98 5 0.00 0.02

Ranked selected features on input datasets

The following table shows the top 10 ranked features by the RELIEF feature selection algorithm for LOV and SPARQL datasets. A 0 means that no further significant features were selected. Other integer values correspond to the features discussed in the paper with the following equivalences:

dataset 1stFeat 2ndFeat 3rdFeat 4thFeat 5thFeat 6thFeat 7thFeat 8thFeat 9thFeat 10thFeat
1 lov/adms-4-1-1-allDrift-T 2 3 4 1 6 5 7 15 16 8
2 lov/aiiso-3-1-1-allDrift-T 5 6 0 0 0 0 0 0 0 0
3 lov/bag-3-1-1-allDrift-T 5 6 0 0 0 0 0 0 0 0
4 lov/basic-5-1-1-allDrift-T 6 4 3 2 1 11 10 12 16 15
5 lov/bbc-4-1-1-allDrift-T 1 3 2 4 5 6 12 13 8 7
6 lov/bbccms-3-1-1-allDrift-T 5 6 0 0 0 0 0 0 0 0
7 lov/bbccore-4-1-1-allDrift-T 1 4 2 3 5 6 10 12 13 11
8 lov/bibo-3-1-1-allDrift-T 6 5 0 0 0 0 0 0 0 0
9 lov/bio-7-1-1-allDrift-T 6 5 5 6 6 5 8 10 9 14
10 lov/biro-3-1-1-allDrift-T 3 4 2 1 0 0 0 0 0 0
11 lov/c4o-4-1-1-allDrift-T 2 3 4 1 6 5 7 15 16 8
12 lov/cnt-3-1-1-allDrift-T 16 15 7 10 9 8 11 14 13 12
13 lov/co-6-1-1-allDrift-T 5 6 4 1 3 2 15 14 16 11
14 lov/cogs-4-1-1-allDrift-T 6 6 5 5 0 0 0 0 0 0
15 lov/cold-5-1-1-allDrift-T 8 16 9 10 15 11 14 13 7 12
16 lov/comm-5-1-1-allDrift-T 1 4 2 3 5 6 4 3 2 1
17 lov/d2rq-9-1-1-allDrift-T 6 6 6 5 5 5 1 3 2 4
18 lov/dcat-6-1-1-allDrift-T 1 4 2 3 5 6 15 11 12 13
19 lov/dcite-9-1-1-allDrift-T 11 9 10 12 13 8 7 16 15 14
20 lov/earl-7-1-1-allDrift-T 15 16 14 11 10 9 8 7 13 12
21 lov/ebucore-7-1-1-allDrift-T 16 8 7 10 9 11 15 14 13 12
22 lov/edm-4-1-1-allDrift-T 2 3 4 1 15 7 8 16 10 13
23 lov/emp-4-1-1-allDrift-T 1 1 3 2 4 3 2 4 5 5
24 lov/fabio-15-1-1-allDrift-T 16 14 12 13 15 7 16 7 11 10
25 lov/foaf-10-1-1-allDrift-T 16 15 14 11 12 10 16 7 8 9
26 lov/food-6-1-1-allDrift-T 1 1 3 4 2 4 3 2 5 6
27 lov/geofla-3-1-1-allDrift-T 3 4 2 1 0 0 0 0 0 0
28 lov/geom-4-1-1-allDrift-T 5 6 5 6 13 16 12 14 10 11
29 lov/gn-12-1-1-allDrift-T 5 7 12 8 13 6 0 0 0 0
30 lov/gndo-4-1-1-allDrift-T 16 8 15 10 9 7 11 14 13 12
31 lov/hr-3-1-1-allDrift-T 16 15 7 10 9 8 11 13 14 12
32 lov/itsmo-5-1-1-allDrift-T 13 8 7 12 14 15 16 12 13 7
33 lov/ldp-3-1-1-allDrift-T 6 5 0 0 0 0 0 0 0 0
34 lov/lemon-4-1-1-allDrift-T 6 6 5 5 0 0 0 0 0 0
35 lov/lexinfo-6-1-1-allDrift-T 6 16 8 10 9 11 7 12 13 14
36 lov/lgdo-4-1-1-allDrift-T 16 7 15 10 8 9 11 12 13 14
37 lov/lingvo-13-1-1-allDrift-T 16 8 15 10 9 7 11 14 12 13
38 lov/lv-16-1-1-allDrift-T 11 14 13 15 9 16 12 10 8 7
39 lov/marl-4-1-1-allDrift-T 16 8 15 10 9 7 11 13 14 12
40 lov/md-4-1-1-allDrift-T 16 8 15 10 9 7 11 12 14 13
41 lov/mrel-7-1-1-allDrift-T 16 7 15 14 13 16 1 2 5 4
42 lov/msm-4-1-1-allDrift-T 16 13 14 11 12 10 15 7 8 9
43 lov/mtlo-3-1-1-allDrift-T 5 6 3 2 0 0 0 0 0 0
44 lov/music-3-1-1-allDrift-T 5 6 4 3 1 0 0 0 0 0
45 lov/nif-10-1-1-allDrift-T 5 5 5 5 3 4 13 12 9 10
46 lov/ntag-3-1-1-allDrift-T 6 5 0 0 0 0 0 0 0 0
47 lov/ocd-4-1-1-allDrift-T 1 3 4 2 12 13 10 11 8 9
48 lov/opmw-4-1-1-allDrift-T 5 10 8 7 15 9 16 11 14 13
49 lov/org-10-1-1-allDrift-T 8 7 12 13 3 4 14 16 9 10
50 lov/pattern-3-1-1-allDrift-T 6 1 2 5 3 4 0 0 0 0
51 lov/pav-8-1-1-allDrift-T 16 8 15 10 9 7 11 13 14 12
52 lov/po-3-1-1-allDrift-T 5 6 0 0 0 0 0 0 0 0
53 lov/poste-5-1-1-allDrift-T 1 3 4 2 5 6 7 15 16 8
54 lov/pro-11-1-1-allDrift-T 1 4 3 2 5 6 13 8 15 14
55 lov/prov-5-1-1-allDrift-T 4 3 2 1 13 12 8 16 10 11
56 lov/prv-8-1-1-allDrift-T 1 2 1 2 0 0 0 0 0 0
57 lov/pso-8-1-1-allDrift-T 1 3 2 4 6 5 11 13 14 15
58 lov/qb-4-1-1-allDrift-T 8 12 7 13 16 15 14 1 9 10
59 lov/ruto-3-1-1-allDrift-T 16 15 7 10 9 8 11 13 14 12
60 lov/sam-3-1-1-allDrift-T 1 2 3 4 6 0 0 0 0 0
61 lov/semio-4-1-1-allDrift-T 5 5 6 1 4 3 2 6 15 16
62 lov/sio-7-1-1-allDrift-T 16 10 9 12 11 13 7 16 15 14
63 lov/sport-4-1-1-allDrift-T 5 5 6 6 1 2 3 4 15 16
64 lov/spt-5-1-1-allDrift-T 16 13 14 11 12 10 16 7 15 9
65 lov/taxon-5-1-1-allDrift-T 5 5 9 11 12 13 14 10 8 16
66 lov/teach-6-1-1-allDrift-T 1 4 6 5 3 2 14 13 16 11
67 lov/thors-5-1-1-allDrift-T 6 5 4 1 3 2 12 13 14 15
68 lov/tisc-5-1-1-allDrift-T 3 1 4 5 6 2 1 6 3 5
69 lov/tm-4-1-1-allDrift-T 6 5 4 6 0 0 0 0 0 0
70 lov/txn-8-1-1-allDrift-T 5 5 5 5 9 8 10 16 11 12
71 lov/umbel-8-1-1-allDrift-T 6 6 6 6 5 5 5 5 12 11
72 lov/vivo-10-1-1-allDrift-T 1 1 6 1 6 1 1 6 6 2
73 lov/voaf-7-1-1-allDrift-T 5 6 4 5 2 3 6 1 1 4
74 sparql/fao-3-1-1-allDrift-T 16 15 7 10 9 8 11 12 13 14
75 sparql/lingvoj-5-1-1-allDrift-T 6 5 10 15 14 9 13 12 16 8

Relationship between dataset features and best classifier performance

Through regression, we analyse what dataset characteristics are good predictors of the performance of the best selected classifier in our approach, using the area under the ROC curve as a response variable. We find that, under the null hypothesis of normality and non-dependence, the predictors nSnapshots, avgTreeDepth, ratioStructural, ratioInserts and ratioComm are good explanatory variables (i.e. have an influence) with respect to the performance of change detection in version chains. The first model, which includes ratioInserts discarding ratioDeletes and ratioComm due to multi-colinearity, shows the best model fit with respect to the data. The figure below depicts this.

Dependent variable:
roc
(1)(2)(3)
log(nSnapshots)0.365***0.350***0.370***
(0.080)(0.085)(0.083)
log(avgGap)-0.0010.0130.006
(0.031)(0.032)(0.032)
log(totalSize)-0.029-0.023-0.029
(0.021)(0.023)(0.022)
avgTreeDepth0.114***0.113***0.115***
(0.038)(0.039)(0.038)
ratioInstances0.4650.4260.477
(0.327)(0.343)(0.336)
ratioStructural-1.858**-1.711**-1.895**
(0.755)(0.804)(0.782)
ratioInserts0.748***
(0.249)
ratioDeletes0.173
(0.253)
ratioComm-0.265*
(0.136)
Constant-0.125-0.0640.154
(0.240)(0.249)(0.261)
Observations131131131
R20.2690.2180.239
Adjusted R20.2280.1740.196
Residual Std. Error (df = 123)0.3450.3570.352
F Statistic (df = 7; 123)6.471***4.908***5.517***
Note:*p<0.1; **p<0.05; ***p<0.01

Relationshiop between dataset features and best chosen classifier

Through multinomial logistic regression, we analyse what dataset characteristics are good predictors of the classifier type selected as best in our approach. We find that avgGap is influential at selecting a tree classifier instead of a bayes one. We also find that totalSize is influential at selecting functions and rules based classifiers instead of bayes ones. The pictures below show simulations on how these predictors influcence the choice of the different classifier families, in one overall figure and two detailed ones (simulating the effect of avgGap and totalSize).

Dependent variable:
functionsrulestreesfunctionsrulestreesfunctionsrulestrees
(1)(2)(3)(4)(5)(6)(7)(8)(9)
log(nSnapshots)-0.291-0.2571.975-0.180-0.2391.745-0.193-0.2121.838
(0.656)(0.765)(1.503)(0.680)(0.790)(1.512)(0.667)(0.777)(1.497)
log(avgGap)0.2380.1451.385*0.2660.1731.269*0.2480.1611.351*
(0.242)(0.271)(0.734)(0.240)(0.269)(0.703)(0.240)(0.270)(0.729)
log(totalSize)0.669***0.539*-0.0520.636**0.531*-0.0100.641***0.524*-0.025
(0.249)(0.278)(0.563)(0.251)(0.282)(0.555)(0.249)(0.279)(0.557)
avgTreeDepth-0.399-0.3340.534-0.393-0.3360.564-0.385-0.3230.553
(0.302)(0.330)(0.719)(0.304)(0.334)(0.728)(0.303)(0.332)(0.728)
ratioInstances1.3782.4633.0901.0712.2463.3941.2692.3303.221
(3.485)(4.021)(6.654)(3.455)(3.981)(6.629)(3.476)(4.005)(6.649)
ratioStructural-9.0541.357-9.539-9.0391.674-10.799-9.5941.116-10.030
(6.040)(6.135)(13.505)(6.142)(6.353)(13.945)(6.136)(6.267)(13.827)
ratioInserts3.0062.376-3.540
(1.906)(2.210)(4.401)
ratioDeletes1.9180.929-2.341
(1.907)(2.154)(4.058)
ratioComm-1.440-0.9451.615
(1.028)(1.170)(2.219)
Constant-5.610**-5.580**-12.702**-5.288**-5.259**-12.402**-4.059*-4.494*-14.266**
(2.248)(2.511)(5.954)(2.210)(2.494)(5.759)(2.265)(2.585)(6.511)
Akaike Inf. Crit.313.543313.543313.543316.179316.179316.179314.605314.605314.605
Note:*p<0.1; **p<0.05; ***p<0.01