The duplication engine is based on the open source ruby-only project flay developed by Ryan Davis.
To enable duplication analysis, add the following to your .codeclimate.yml configuration file, removing any languages which aren't present in your repository:
You can also enable the engine via the CLI with:
$ codeclimate engines:enable duplication
This will create a default configuration file for you if you don’t already have one.
More information about the CLI is available in the README here: https://github.com/codeclimate/codeclimate
We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.
The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.
If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.
To adjust this setting, add a
mass_threshold with your preferred value under a particular enabled language:
In addition to mass thresholds, the Python duplication analysis can be configured to target a particular version of Python. by default, Python 2 is targeted, but Python 3 users can specify this with a
engines duplication enabledtrue config languages python python_version3
If there are certain checks you would like to ignore, such as Similar Code, you can disable a check within the duplication engine as per below:
engines duplication enabledtrue checks Similar code enabledfalse
For more information about disabling checks within engines, check out our comprehensive doc here.
Two flavors of duplication can raise issues:
When 2 or more blocks of code contain the exact same variable names and structure.
When 2 or more blocks of code contain the same structure, but have different contents (such as variable names or literal values). This can help catch cases where a developer has copy and pasted a section of code, leaving the structure the same, but adjusting some variable names for a different context.
"Mass" refers to the size of the duplicated code. Specifically, mass is determined by the size of a code block's s-expression, after it has been parsed into a node of an Abstract Syntax Tree (AST).
You can identify a code snippet's mass by looking at issue reports in Code Climate's UI:
You can read our Analysis Concepts documentation on Duplication for further details.