Adding pre-commit Hooks to Python Repo

Writing code can be tough and writing clean code can be even more difficult some times. When you get on a roll and put together highly functional and imaginative code, it might not always look the greatest. Also, when crunched for time, it can be very difficult to go back over the code and attempt to make it as pretty as possible. You might also need to make sure other files used by your code are also formatted properly. Next, add in multiple developers, and you’ve got yourself something that can grow uglier over time. Much of these issues can be addressed by using pre-commit hooks in your Python code’s github repo.

In the Exporting CloudWatch Logs to S3 post, I wrote about creating some Python code for getting logs from CloudWatch to S3. I’ll be using this code as a basis for getting pre-commit setup and running. The idea is that upon every commit to the repo, the code will be evaluated by whatever hooks you’d like to use. My example will be using the following hooks:

Installing pre-commit

Installing pre-commit is relatively simple and there are a number of ways already documented on the pre-commit website. You can confirm installation was successful by executing the command with the --version flag.

 % pre-commit --version
pre-commit 2.20.0

Setting Up pre-commit

With pre-commit installed, the next step is to configure the hooks for your repo. You do this by first creating a .pre-commit-config.yaml file in the root of your github repo. The basic format of this file is as follows

repos:
-  repo: https://repo_url
   rev: some_version
   hooks:
   -  id: <hook_name>
  • repos is a YAML array that contains all of the repos, versions, and hooks to run.
    • repo is the URL of the github URL where you can find the code for the particular repo.
    • rev is the specific tag/version that you plan to use from the specified repo
    • hooks is a YAML array that contains the id/name of each hook you intend to use from the repo
      • id is the identifier for each hook you plan to execute upon a commit

Finding Entries for Your pre-commit Config

The next logical question is how do I find this info? The pre-commit hooks list provides a list of supported hooks that can be included in this YAML file. Using the end-of-file-fixer hook as an example, let’s find it in the hooks list. You’ll note that it is a bullet item under the https://github.com/pre-commit/pre-commit-hooks repo. If you open this repo and navigate to the Tags link, you’ll see a list of versions that have been generated. We’re going to use the latest greatest as of this posting, v4.3.0. At this point, you have everything you need to populate the .pre-commit-config.yaml file

  • repo – https://github.com/pre-commit/pre-commit-hooks
  • rev – v4.3.0
  • id – end-of-file-fixer

Building Your pre-commit Config File

Take the information gathered above, you would have an initial .pre-commit-config.yaml that looked like:

repos:
-  repo: https://github.com/pre-commit/pre-commit-hooks
   rev: v4.3.0
   hooks:
   -  id: end-of-file-fixer

Continue the previous process for the other hooks noted above, and you’ll end up with a .pre-commit-config.yaml that looks like this

repos:
-  repo: https://github.com/pre-commit/pre-commit-hooks
   rev: v4.3.0
   hooks:
   -  id: end-of-file-fixer
   -  id: trailing-whitespace
-  repo: https://github.com/PyCQA/flake8
   rev: 5.0.4
   hooks:
   -  id: flake8
-  repo: https://github.com/psf/black
   rev: 22.10.0
   hooks:
   -  id: black
-  repo: https://github.com/PyCQA/isort
   rev: 5.10.1
   hooks:
   -  id: isort

With this configuration built, we need to now install all of the hooks that we defined with pre-commit install.

% pre-commit install
pre-commit installed at .git/hooks/pre-commit

With the above command in place, pre-commit will run the configured hooks against any files in a commit.

Running pre-commit

Everything is configured and ready to go! Now we need to execute pre-commit against our files to see how they look. For the very first run and any time you add new hooks, you’ll want to run it with the --all-files switch so the hook(s) execute against all files. The default behavior is to execute against only files in the commit.

% pre-commit run --all-files
[INFO] Initializing environment for https://github.com/PyCQA/flake8.
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/PyCQA/isort.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/flake8.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/isort.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing main.py
Fixing modules/cloudwatch_to_s3.py

trim trailing whitespace.................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

main.py:16:51: E251 unexpected spaces around keyword / parameter equals
main.py:16:53: E251 unexpected spaces around keyword / parameter equals
main.py:16:80: E501 line too long (145 > 79 characters)
main.py:16:80: E251 unexpected spaces around keyword / parameter equals
main.py:16:82: E251 unexpected spaces around keyword / parameter equals
main.py:16:119: E251 unexpected spaces around keyword / parameter equals
main.py:16:121: E251 unexpected spaces around keyword / parameter equals
main.py:16:136: E251 unexpected spaces around keyword / parameter equals
main.py:16:138: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:15:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:15:30: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:15:32: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:18:21: E203 whitespace before ':'
modules/cloudwatch_to_s3.py:19:19: E203 whitespace before ':'
modules/cloudwatch_to_s3.py:23:1: E305 expected 2 blank lines after class or function definition, found 1
modules/cloudwatch_to_s3.py:26:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:26:32: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:26:34: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:34:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:35:80: E501 line too long (98 > 79 characters)
modules/cloudwatch_to_s3.py:42:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:42:80: E501 line too long (89 > 79 characters)
modules/cloudwatch_to_s3.py:61:1: E305 expected 2 blank lines after class or function definition, found 1
modules/cloudwatch_to_s3.py:65:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:69:20: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:69:22: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:85:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:86:80: E501 line too long (83 > 79 characters)
modules/cloudwatch_to_s3.py:102:1: E302 expected 2 blank lines, found 1
modules/cloudwatch_to_s3.py:102:39: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:102:41: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:105:46: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:105:48: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:107:80: E501 line too long (111 > 79 characters)
modules/cloudwatch_to_s3.py:108:14: E275 missing whitespace after keyword
modules/cloudwatch_to_s3.py:109:80: E501 line too long (132 > 79 characters)
modules/cloudwatch_to_s3.py:111:50: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:111:52: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:121:1: E305 expected 2 blank lines after class or function definition, found 1
modules/cloudwatch_to_s3.py:124:1: E302 expected 2 blank lines, found 0
modules/cloudwatch_to_s3.py:127:80: E501 line too long (94 > 79 characters)
modules/cloudwatch_to_s3.py:129:46: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:129:48: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:130:65: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:130:67: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:130:80: E501 line too long (91 > 79 characters)
modules/cloudwatch_to_s3.py:131:63: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:131:65: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:131:80: E501 line too long (87 > 79 characters)
modules/cloudwatch_to_s3.py:132:70: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:132:72: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:132:80: E501 line too long (127 > 79 characters)
modules/cloudwatch_to_s3.py:132:101: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:132:103: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:55: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:57: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:79: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:80: E501 line too long (224 > 79 characters)
modules/cloudwatch_to_s3.py:133:81: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:117: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:119: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:160: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:162: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:195: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:133:197: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:135:36: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:135:38: E251 unexpected spaces around keyword / parameter equals
modules/cloudwatch_to_s3.py:137:20: E203 whitespace before ':'
modules/cloudwatch_to_s3.py:138:27: E203 whitespace before ':'

black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted main.py
reformatted modules/cloudwatch_to_s3.py

All done! ✨ 🍰 ✨
2 files reformatted.

isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing ./cloudwatch_to_s3/main.py
Fixing ./cloudwatch_to_s3/modules/cloudwatch_to_s3

Wow! That is pretty ugly! Here’s the nice thing, notice that we do see some Fixing statements towards the end of the run? This means that some of the hooks identify problems AND fix them. Let’s rerun it again to see what couldn’t be fixed automatically

% pre-commit run --all-files
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

modules/cloudwatch_to_s3.py:44:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:100:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:101:80: E501 line too long (83 > 79 characters)
modules/cloudwatch_to_s3.py:123:80: E501 line too long (111 > 79 characters)
modules/cloudwatch_to_s3.py:150:80: E501 line too long (94 > 79 characters)
modules/cloudwatch_to_s3.py:156:80: E501 line too long (85 > 79 characters)
modules/cloudwatch_to_s3.py:158:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:169:80: E501 line too long (88 > 79 characters)

black....................................................................Passed
isort....................................................................Passed

Addressing The Results

It looks like the only thing that remains are a few E501 line too long errors. The nice thing about this output is that I know the file and the line number in the file. Let’s start by looking at the last failure

modules/cloudwatch_to_s3.py:169:80: E501 line too long (88 > 79 characters)

I like to start looking at the last error in case I have to split a comment into multiple lines, it doesn’t mess up the report. Yes, I know I could just run pre-commit again but then there’s laziness.

After opening up modules/cloudwatch_to_s3.py, I can see that Line 169 is

        return {"task_id": export_task_id, "s3_export_path": s3_bucket_prefix_with_date}

and this could be an easy fix where I assign it to a variable and return the variable instead but let’s move on from it.

I was able to fix most of the lines by splitting them into multiline comments except the ones reported for lines 100, 101, 156, and 169. One could argue that I could make the exception messages shorter in lines 100 and 101 or split 156 and 169 into multiple lines but I’m going to take a different approach. I’m going to say that I’m ok with lines that are up to 88 characters instead of the default 79 character limit imposed by flake8.

Tweaking the Hook

In looking at the flake8 arguments list, you’ll see an argument called max-line-length that allows you to supply a different value. Let’s first show a run with everything else corrected

% pre-commit run --all-files
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

modules/cloudwatch_to_s3.py:44:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:100:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:101:80: E501 line too long (83 > 79 characters)
modules/cloudwatch_to_s3.py:158:80: E501 line too long (85 > 79 characters)
modules/cloudwatch_to_s3.py:160:80: E501 line too long (81 > 79 characters)
modules/cloudwatch_to_s3.py:171:80: E501 line too long (88 > 79 characters)

black....................................................................Passed
isort....................................................................Passed

We can supply an args array with different arguments. We add the argument into our .pre-commit-config.yaml file like the below.

repos:
-  repo: https://github.com/pre-commit/pre-commit-hooks
   rev: v4.3.0
   hooks:
   -  id: end-of-file-fixer
   -  id: trailing-whitespace
-  repo: https://github.com/PyCQA/flake8
   rev: 5.0.4
   hooks:
   -  id: flake8
      args:
      -  "--max-line-length=88"
-  repo: https://github.com/psf/black
   rev: 22.10.0
   hooks:
   -  id: black
-  repo: https://github.com/PyCQA/isort
   rev: 5.10.1
   hooks:
   -  id: isort

Now with this addition, let’s run the check again

% pre-commit run --all-files
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
flake8...................................................................Passed
black....................................................................Passed
isort....................................................................Passed

Everything passed! Now we’ll commit these changes and have nice pretty code for others to read.

% git commit -a
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
flake8...................................................................Passed
black....................................................................Passed
isort....................................................................Passed
[main 1fd46b3] Code all fixed up with precommits
 2 files changed, 90 insertions(+), 54 deletions(-)

Upon attempting the commit, our pre commit hooks also run again.