RPKI Analysis Code (for reproducibility of the IMC’19 paper)
Preliminary
To analyze validated BGP announcements, you first need to have two datasets: RPKI historical objects and public BGP datasets.
Due to the massive size of the datasets (46 billion BGP announcements and 8-years RPKI objects), we strongly encourage you to use distributed cluster-computing framework (we used Spark for large-scale data processing. For your information, it took more than 3 days to verify all 46 billion BGP announcements with RPKI by utilizing ~700 cores and ~4 TB RAMs. )
During the validation process, we also use two additional datasets to infer (1) the relationship between an AS who originated the BGP announcement and an AS that actually owns the announced IP prefix (i.e., ASN in the ROA), and (2) ISP and country information from the ASN from CAIDA: as-organization, as-relationship.
Lastly, we also used NRO (Number Resource Organization) to infer who allocated and assigned the IP prefixes advertised through BGP.
Summary of source codes
Here, we provide following source codes. The instruction and usage of the source codes are explainedbelow.
filename | Download | Misc. |
---|---|---|
produce-vrps.py |
link | You can use Ziggy to produce a set of VRPs. |
data-pruning.py |
link | |
spark-verify.py |
link | This code runs on a Spark cluster. |
analysis-codes.tar.gz |
link | |
plotting-script.tar.gz |
link | Gnuplot scripts. |
Reproducing the figures in the IMC’19 paper
1. Generate Validated ROA Payloads (VRPs) from ROAs.
Among the historical RPKI objects, we specifically focus on ROAs to generate VRPs.
A structure of ROA is defined at RFC 6482.
The script produce-vrps.py
will generate VRPs from each of the ROAs in the following format, which will be used to validate the IP prefixes from BGP announcements. (We used a python third-party library, rpki.net, to parse ROA objects.)
time | prefix | prefix-len | max-len | ASN | num-covered-ip-addresses | country code | TAL |
---|---|---|---|---|---|---|---|
20170601 | 103.205.38.0 | 24 | 24 | 64076 | 256 | N/A | apnic |
20170601 | 101.101.96.0 | 22 | 24 | 45932 | 1024 | N/A | apnic |
20170601 | 103.1.156.0 | 22 | 24 | 45932 | 1024 | N/A | apnic |
… | … | … | … | … | … | … | … |
2. Obtain daily unique BGP prefixes from each of the BGP Datasets.
After obtaining BGP datasets, we need to remove duplicate IP prefixes announcement to reduce the size of the datasets and to make validation process faster.
The script, data-pruning.py
, removes duplicated entries and outputs in the following format:
vantage point | type | time | flag | peer-ip | peer-as | prefix | as-path | protocol |
---|---|---|---|---|---|---|---|---|
rrc00 | BGP4MP | 05/28/18 07:02:51 | A | 111.91.233.1 | 45896 | 0.0.0.0/0 | 45896 3356 | IGP |
rrc00 | BGP4MP | 05/28/18 07:03:52 | A | 111.91.233.1 | 45896 | 100.0.0.0/16 | 45896 3356 701 | IGP |
rrc00 | BGP4MP | 05/28/18 07:03:44 | A | 111.91.233.1 | 45896 | 1.0.0.0/24 | 45896 3356 6762 13335 13335 | IGP |
rrc00 | BGP4MP | 05/28/18 07:03:52 | A | 111.91.233.1 | 45896 | 100.10.0.0/16 | 45896 3356 701 | IGP |
rrc00 | BGP4MP | 05/28/18 07:03:52 | A | 111.91.233.1 | 45896 | 100.1.0.0/16 | 45896 3356 701 | IGP |
rrc00 | BGP4MP | 05/28/18 07:03:52 | A | 111.91.233.1 | 45896 | 100.11.0.0/16 | 45896 3356 701 | IGP |
… | … | … | … | … | … | … | … | … |
3. Validate BGP Datasets (obtained from 2) against VRP (obtained from 1).
Now you are ready to validate BGP announcements (obtained from 2) using VRPs (obtained from 1).
When validating BGP announcements, we followed the following algorithm from RFC6811.
(For those who are interested in the details of how VRPs are used to verify BGP announcement, please refer BGP Prefix Origin Validation (RFC6811))
// This Pseudo-Code is an excerpt from RFC6811.
result = BGP_PFXV_STATE_NOT_FOUND;
//Iterate through all the Covering entries in the local VRP
//database, pfx_validate_table.
entry = next_lookup_result(pfx_validate_table, route_prefix);
while (entry != NULL) {
prefix_exists = TRUE;
if (route_prefix_length <= entry->max_length) {
if (route_origin_as != NONE
&& entry->origin_as != 0
&& route_origin_as == entry->origin_as) {
result = BGP_PFXV_STATE_VALID;
return (result);
}
}
entry = next_lookup_result(pfx_validate_table, input.prefix);
}
//If one or more VRP entries Covered the route prefix, but
//none Matched, return "Invalid" validation state.
if (prefix_exists == TRUE) {
result = BGP_PFXV_STATE_INVALID;
}
return (result);
The source code, spark-verify.py
produces the validated results with the following format (Please note that we used Spark cluster for a large-scale data processing):
time | prefix-addr | prefix-len | origin | origin-isp | origin-country | Verified Information1 |
---|---|---|---|---|---|---|
20181227 | 99.108.0.0 | 14 | 7018 | AT&T Services Inc. | US | 1,7018,AT&T Services Inc.,US,None,99.108.0.0/14-14 |
20181227 | 99.112.0.0 | 12 | 7018 | AT&T Services Inc. | US | 1,7018,AT&T Services Inc.,US,None,99.112.0.0/12-12 |
20181227 | 99.192.128.0 | 17 | 27589 | MOJOHOST | US | 1,27589,MOJOHOST,US,None,99.192.128.0/17-24 |
20181227 | 99.32.0.0 | 12 | 7018 | AT&T Services Inc. | US | 1,7018,AT&T Services Inc.,US,None,99.32.0.0/12-12 |
1 Verified information is a list of the validated result of a given BGP announcement from its covered ROA, which contains a list of (validation-index, ASN of the covered ROA, ISP of the covered ROA, country code of the covered ROA, relationship between the origin ASN of the BGP and ASN of the covered ROA, and covered IP prefix) tuples; the details of validation-index can be found in the source code.
4. Analyze Validated BGP announcements
The analysis-codes.gz
and plotting-script.gz
contain three analysis scripts and 10 plotting scripts which generate the figures in the paper. The below table describes each of the functions in the code that generates the dataset and plotting gnuplot scripts.
file | function | figures in the paper | gnuplot script |
---|---|---|---|
spark-rpki-object-validation.py | runSparkROAsIPCnt, runSparkROAsIPPercentage, runSparkPercentageASesInROAs | Figure 2 | num-vrps-as-ip-ipv4-byIRR.plot |
spark-analysis.py | runSparkCalcRPKIEnabledAdv | Figure 3 | percentage-rpki-enabled-adv-ipv4.plot |
spark-analysis.py | runSparkValidationUniquePrefix, runSparkValidationUniquePrefixAllPrefix | Figure 4, Figure 5 | percentage-rpki-uniq-prefix-asn-invalid-ipv4-merge.plot, percentage-rpki-uniq-prefix-asn-invalid-ipv4-focus.plot |
spark-analysis.py | runSparkValidationUniquePrefix | Figure 6 | num-rpki-uniq-prefix-asn-invalid-adv-reasoning-ipv4.plot |
spark-rpki-object-validation.py | runSparkNumPrefixWithMaxlen | Figure 7 | percentage-ipprefix-with-maxlen-ipv4.plot (a), percentage-rpki-uniq-prefix-asn-merge-adv-hasMaxLen-ipv4.plot (b) |
spark-analysis.py | runSparkClassifyHijackingUniquePrefix | Figure 8 | num-rpki-uniq-prefix-classify-hijack-ipv4.plot |
spark-analysis.py | runSparkClassifyHijackingUniquePrefixDuration | Figure 9 | cdf-num-attack-duration.plot |
spark-analysis.py | runSparkClassifyHijackingUniquePrefix | Figure 10 | percentage-rpki-uniq-prefix-asn-invalid-suspicious-ipv4.plot |
hijack-analysis.py | getPairsOfAttack | Figure 11, 12 | cdf-num-hijacked-ipv4-byAS, cdf-num-attacker-ipv4-byAS.plot |