666 Python repos checked. 5409 open(…) calls. 11% were never closed!

Code Review Doctor
3 min readJan 28, 2022

--

It’s a problem that 11% of files are not closed properly as leaving files open degrades application performance and increases the risk of unexpected side effects.

How did we find this statistic? At CodeReview.doctor we run static analysis checks on codebases, and we collect stats on the public repos. We tracked every line we could find where developers opened files via open(...) but never closed them and compared that with every line we could find where developers “did it right” (either explicitly calling close() or implicitly via context manager usage).

At the end of this article is a link and embed of the stats.

The problem

To recap: leaving files open results in degradation of application performance and increases the risk of unexpected side effects.

Computers have finite resources, including limits on how many files can be simultaneously open. Therefore leaving files open consumes more finite file handles than necessary. Under extreme conditions this could exhaust the number of allowed open files causing an “IOError: Too many open files" exception to be raised. Additionally, leaving files open consumes finite memory for longer than is necessary. Therefore for performance of the application it is preferable to closing file after finished with them.

Python’s garbage collection will eventually close dangling unclosed files, but “eventually” should raise a red flag if the goal is to write predictable software. Indeed, Python3 and Cpython might implicitly close the file the instant they’re no longer being used, while others might not for a long time. Under some conditions it might not happen at all. When pairing this problem with the zen of Python’s Explicit is better than implicit, it’s clear that it’s better to not leave files open.

Python can also be fuzzy around exactly when file changes are committed to disk: the data may not be written to disk until the file is closed:

# bad: maybe not written to disk yet as the file was not closed
open('foo.txt', 'w').write('foo')
# also bad: maybe not written to disk as the file was not closed
f = open('bar.txt', 'w')
f.write('bar')
# better: written to disk when the file closed
f = open('baz.txt', 'w')
f.write('baz')
f.close()
# good: written to disk when the file closed and code is neater
with open('qux.txt', 'w') as f:
f.write('qux')

Windows (the Operating System) locks files opened by Python. This could mean leaving files open is not thread safe as one thread will lock the file preventing other threads from opening it — even if the other thread just wants to read from it.

That’s why we give this advice:

The stats

Here’s a link to the 4812 examples of “doing it right”: all the files were closed either by explicitly calling close() or by opening the file using the context manager approach.

Here’s an embed of the 597 examples of not “doing it right”: the files were opened and read from but never closed:

--

--

Code Review Doctor
Code Review Doctor

Written by Code Review Doctor

I’m a GitHub bot that automatically improves your Python and Django. https://codereview.doctor

No responses yet