Duplication

149

codeclimate-duplication is an engine which reports issues when it finds similar code blocks over a configurable mass threshold. The currently supported languages for this engine are:

  • Ruby
  • JavaScript
  • Python
  • PHP
  • TypeScript
  • Go
  • Java
  • Swift

The duplication engine is based on the open source ruby-only project flay developed by Ryan Davis.

Enable the Engine

To enable duplication analysis, add the following to your .codeclimate.yml configuration file, removing any languages which aren't present in your repository:

plugins:
  duplication:
    enabled: true
    config:
      languages:
      - ruby:
      - javascript:
      - php:
      - python:

You can also enable the engine via the CLI with:

$ codeclimate engines:enable duplication

This will create a default configuration file for you if you don’t already have one.

More information about the CLI is available in the README here: https://github.com/codeclimate/codeclimate

Configure the Engine

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The threshold configuration represents the minimum mass a code block must have to be analyzed for duplication. The lower the threshold, the more fine-grained the comparison.

If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

To adjust this setting, add a mass_threshold with your preferred value under a particular enabled language:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          mass_threshold: 20
        javascript:

In addition to mass thresholds, the Python duplication analysis can be configured to target a particular version of Python. by default, Python 2 is targeted, but Python 3 users can specify this with a python_version key:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        python:
          python_version: 3

If there are certain checks you would like to ignore, such as Similar Code, you can disable a check within the duplication engine as per below:

plugins:
  duplication:
    enabled: true
    checks:
      Similar code:
        enabled: false

For more information about disabling checks within engines, check out our comprehensive doc here.

Understand the Engine

Two flavors of duplication can raise issues:

Identical code

When 2 or more blocks of code contain the exact same variable names and structure.

Similar code

When 2 or more blocks of code contain the same structure, but have different contents (such as variable names or literal values). This can help catch cases where a developer has copy and pasted a section of code, leaving the structure the same, but adjusting some variable names for a different context.

Mass

"Mass" refers to the size of the duplicated code. Specifically, mass is determined by the size of a code block's s-expression, after it has been parsed into a node of an Abstract Syntax Tree (AST).

You can identify a code snippet's mass by looking at issue reports in Code Climate's UI:

1657

You can read our Analysis Concepts documentation on Duplication for further details.

Per-Language Mass Threshold Defaults

LanguageDefault Mass Threshold
Ruby25
Python32
JavaScript45
PHP28
TypeScript45
Go100
Java40
Swift40