javascript-clone-detection

Academic study project on duplication of Javascript code using AST syntactic analysis

Usage no npm install needed!

<script type="module">
  import javascriptCloneDetection from 'https://cdn.skypack.dev/javascript-clone-detection';
</script>

README

JavaScript Clone Detection - (v0.6.0)

Academic study project on JavaScript code duplication using AST parsing with text similarity.

Usage

Run:

make init
clone-analisys <PATH> <SIMILARITY INDEX>
// clone-analisys src/api-server 0.85

Current Process

We select a piece of code to convert it into an Abstract Syntax Tree (AST) representation. Then, the cleaning and normalization phase is carried out, in which we remove unwanted attributes and apply a standardization between similar structures, such as the example of an arrow function for a regular function.

// the both code snippets are characterized as type 2 clone

const arrowFunction = (value) => {
  const { type } = value
  return type
}

function regularFunction(value) {
  // this is a regular function
  const { type } = value
  return type
};

To perform a representation of code snippets in AST, we have good libraries like:

Library Version
espree 7.3.1
@babel/parser 7.14.7
abstract-syntax-tree 2.19.1

In this project we are using abstract-syntax-tree because it is a library that offers greater facilities to manipulate an AST.

Similarity between ASTs

To perform the comparison between ASTs, even in this current version, we had two options, namely: i) Comparison between pure ASTs where we only have the return if they are identical or not, or; ii) Convert the ASTs to text (string) and use libraries that check the textual similarity between the code snippets.

Library Version Type
ast-compare 2.1.0 Compare ASTs
string-similarity 4.0.4 Compare strings
string-comparison 1.0.9 Compare strings

The decision to compare ASTs directly seems to be the most coherent decision, but so far lib ast-compare can only identify whether the pieces are identical or not. In this scenario, using the representation of Abstract Syntax Trees still gives us the advantage of being a uniform and easy-to-manipulate representation for pre-processing and normalizations, in addition to transforming it into text so that it can be compared as a textual element.

Results

Using the code snippets examples above, we have:

No pre-processing and normalization

ast-compare:  false
string-similarity (Dice):  0.925351071692535
string-comparison (Cosine):  0.9672041516493517
string-comparison (Levenshtein):  0.9072164948453608
string-comparison (Longest Common Subsequence):  0.9357933579335793
string-comparison (Metric Longest Common Subsequence):  0.9337260677466863

With pre-processing and normalization (v.0.3.1)

ast-compare:  true
string-similarity (Dice):  1
string-comparison (Cosine):  1
string-comparison (Levenshtein):  1
string-comparison (Longest Common Subsequence):  1
string-comparison (Metric Longest Common Subsequence):  1

To learn more about the issues addressed, read: ESTUDO EMPÍRICO SOBRE DUPLICAÇÃO DE CÓDIGO EM APLICAÇÕES REACT.JS.