Duplication
codeclimate-duplication is an engine which reports issues when it finds similar code blocks over a configurable mass threshold. The currently supported languages for this engine are:
- Ruby
- JavaScript
- Python
- PHP
- TypeScript
- Go
- Java
- Swift
The duplication engine is based on the open source ruby-only project flay developed by Ryan Davis.
Enable the Engine
To enable duplication analysis, add the following to your .codeclimate.yml configuration file, removing any languages which aren't present in your repository:
plugins:
duplication:
enabled: true
config:
languages:
- ruby:
- javascript:
- php:
- python:
You can also enable the engine via the CLI with:
$ codeclimate engines:enable duplication
This will create a default configuration file for you if you don’t already have one.
More information about the CLI is available in the README here: https://github.com/codeclimate/codeclimate
Configure the Engine
We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.
The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.
If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.
To adjust this setting, add a mass_threshold
with your preferred value under a particular enabled language:
plugins:
duplication:
enabled: true
config:
languages:
ruby:
mass_threshold: 20
javascript:
In addition to mass thresholds, the Python duplication analysis can be configured to target a particular version of Python. by default, Python 2 is targeted, but Python 3 users can specify this with a python_version
key:
plugins:
duplication:
enabled: true
config:
languages:
python:
python_version: 3
If there are certain checks you would like to ignore, such as Similar Code, you can disable a check within the duplication engine as per below:
plugins:
duplication:
enabled: true
checks:
Similar code:
enabled: false
For more information about disabling checks within engines, check out our comprehensive doc here.
Understand the Engine
Two flavors of duplication can raise issues:
Identical code
When 2 or more blocks of code contain the exact same variable names and structure.
Similar code
When 2 or more blocks of code contain the same structure, but have different contents (such as variable names or literal values). This can help catch cases where a developer has copy and pasted a section of code, leaving the structure the same, but adjusting some variable names for a different context.
Mass
"Mass" refers to the size of the duplicated code. Specifically, mass is determined by the size of a code block's s-expression, after it has been parsed into a node of an Abstract Syntax Tree (AST).
You can identify a code snippet's mass by looking at issue reports in Code Climate's UI:
You can read our Analysis Concepts documentation on Duplication for further details.
Per-Language Mass Threshold Defaults
Language | Default Mass Threshold |
---|---|
Ruby | 25 |
Python | 32 |
JavaScript | 45 |
PHP | 28 |
TypeScript | 45 |
Go | 100 |
Java | 40 |
Swift | 40 |
Updated over 4 years ago