Bad Science: Incessant Protocol Comparisons

An example of "bad science, bad results", from a "protocol comparison" paper published in Airccse "International Journal of Ubiquitous Computing (IJU)

One very, very, very (,very, very, very, …) common type of papers in my scientific domain is “protocol comparisons”. Such papers are very, very, very often of the form:

  1. Pick a domain (e.g., sensor networking, MANET, multicast…) and a topic (e.g., routing).
  2. Take the set of protocols within that domain and topic, which are already implemented by somebody else, and freely available in the network simulator ns2.
  3. Completely ignore previous work within that domain and topic, or with the given set of protocols.
  4. Define a minimal set of scenarios, and run one simulation for each such scenario.
  5. Write up paper, often poorly, presenting “the results”.
  6. Find a publisher, willing to accept said paper for publication.

As should be clear, I am not really a fan of such papers.

Firstly, a very, very, large body of protocol simulation papers exist already. In fact, the very existence of an implementation of a protocol in a network simulator should be a pretty good hint that someone likely already has simulated the protocol – and, if so, likely has published her or his findings. And, if implementations of two or more protocols, solving the same problem, exists in a given network simulator, then that’s also a pretty good hint that someone likely already has compared these protocols – and, also, published the findings of her or his comparison.

And yet, way too often are protocol comparison papers published without the authors having made any effort at studying previous work: either, work studying the individual protocols in question, or even comparing them.

Secondly, very often such papers not only ignores related simulation studies, but they also compare irrelevant, or outdated, protocols – likely because, implementations of that is what is available. For protocols developed within the auspices of the IETF, for example, working documents (called drafts) explicitly expire and do contain an explicit disclaimer for citations, most often ignored in such protocol comparison papers:

“Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.” “.

In the same vein, protocols evolve – often for good reasons, such as to fix a bug, or to improve their performance, or change/expand their applicability scope. Again, such evolution is also often ignored in protocol comparison papers –  as before, likely because of what implementations are available and provided with the chosen network simulator.
Worse, all too often such protocol comparison papers completely ignore “the history of protocol evolutions” – typically by simply ignoring, or failing to acknowledge, that a “more recent, updated, version exists”. The really sad thing is, when the reviewers let such get past their “filter’ and such a paper gets published.

There are, of course, two points to be made here:

  • There may be valid reasons for studying “old versions of a protocol”, for example to understand if a “new version” offers specific improvements, etc. – which, of course, requires also having an implementation of the “new version” …
  • It is, however, never valid to study an “old version of a protocol” out of ignorance.

 “But, the more recent protocol version isn’t available in the network simulation I am using” – that’s actually not an argument for studying an “old version” but rather an opportunity, and therefore a reason: implementing and studying a “newer version” could yield interesting findings, whereas studying an old version most likely wouldn’t.

Third, such papers will show graphs. Typically, lots of graphs, such as the one in this posting. Often these graphs will show lines (representing “data”) that are all over the place – and, very, very, very often, without a lot of interpretation or explanation of the results, and without error bars or any other indicators of the “quality” or statistical significance of the results.

Almost systematically, this is because of step 4 from the list  above: that the authors define a minimal set of scenarios, and run one simulation for each such scenario. Statistics on a sample size of one is … not meaningful. Nor are results from just studying a single scenario.

Considering the figure, included here for illustrative purposes (but, actually extracted from a published protocol comparison paper published in Airccse “International Journal of Ubiquitous Computing (IJU)”  – a paper to which I won’t provide a more precise reference so as to not offend the authors): even without understanding the protocols or the metrics, the curves (especially for DSDV) drop sharply around  “number of nodes = 50” and rise immediately thereafter – an outlier result. And the curves generally fluctuate between two neighboring sample points.

Why is that? Is it a characteristics of the protocol? Is it a bug in the protocol implementation or the simulator? Or, is it an artefact of the single scenario studied? The authors of the paper in question do not attempt  answering that question – nor do they even notice the outlier result.

And, this leads to….

Fourth, often outlier results are not explained in such paper, and conclusions are not drawn by the authors. And then, not enough data presented data for the reader to do so herself or himself. In other words, potentially interesting behaviors are not investigated. That’s just bad science.

In the figure included, the reason for the curve for “Average End-to-End delay” for DSDV to drop sharply is (likely, in my opinion) that (i) fewer data packets are delivered successfully, and (ii) the “Average End-to-End delay” is calculated only on successfully delivered packets. But, as the paper does not include a corresponding “data delivery ratio” graph, this remains but an unconfirmed and unconfirmable hypothesis.

These combined lead to the almost “stereotypical protocol comparison by way of network simulation” paper, of absolutely zero interest and scientific value – and, a lot of these are produced and published every year. It is befuddling that there are academic conferences and journals which accept them, that they are not more systematically rejected by peer-reviewers. Clearly, peer-review doesn’t always work.

To be fair, protocol comparisons can be well done, can be interesting, can present scientifically useful conclusions, and can present innovative results. I’m even guilty of having performed, and published, a few such protocol comparisons myself over my career – and, am likely to produce more.

However, doing a relevant, interesting study is always hard work – doing so in a simulator does not make it any less so. It still requires a complete understanding of the state-of-the-art, and of the problem domain. It requires construction of a valid experiment, collection of statistically significant data, and efforts to draw and present valid conclusions.

I recently came across the paper, from which I extracted some of the examples in this blog entry. As indicated, that paper was published in Airccse “International Journal of Ubiquitous Computing (IJU)“, which claims to be:

 a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing

Published in this journal as recently as 2015, the paper in question is guilty of variations of pretty much all the “sins” discussed in the above.

For example, the paper has a related works section, which states:

Several works have been done concerning the performance evaluation of many MANET routing protocols. We focus on those works performed by network simulator NS-2[4].

Table 1 shows that comparative performance evaluation for all the parameters namely Packet Delivery Ratio, Throughput, Average End to End Delay, Jitter, Routing Load, and Routing Frequency among the routing protocols have not been done in a single paper.

In our article, we will compare five MANET protocols (AODV, DSR, DSDV, OLSR, and DYMO). There is no work in our knowledge in the literature which deals with these five MANET routing protocols by considering the variation of Number of Nodes parameter.

The network simulator used, ns-2, comes with implementations of these five protocols, which likely is why the authors chose them. However looking more carefully at the set of protocols:

  • DYMO was “just” the evolution to and successor of AODV – and, DYMO was itself succeeded by a protocol named …. AODVv2 – this, already at the time when the paper was written. DSDV was an ancient precursor to AODV. While neither DYMO nor AODVv2 were (and still are not) actually standardised, when choosing to study a protocol, understanding if the protocol is still relevant seems a minimum.
  • On the topic of relevance, AODV was published as experimental RFC3561 in July 2003 – yet, the paper cites a previous draft version of the protocol specification,

[17]  Perkins, E. Belding-Royer, and S. Das, “Ad hoc On-Demand Distance Vector (AODV) Routing, “draft-ietf-manet-aodv-13.txt, Feb. 2003

So this particular paper compares AODV, DSR, and OLSR  (& DSDV). That’s been done before (for example, I committed a comparison of these protocols in 2002). The protocols compared are old and outdated. The metrics studied are those which are the most classic: overhead, data packet delays, … And, as indicated in the above, the simulation results presented incomplete, pretty much rendering the paper useless.

The authors are, of course, to blame…. but those most to blame are   Airccse and the editorial board of “International Journal of Ubiquitous Computing (IJU)” – and, in general, the journal publishers who either disregarded peer reviews, disregarded the opinions of their peer reviewers, or selected poor peer reviewers.