Build Your Own LOC Counter
This challenge is to build your own version of the tools cloc, sloc and scc. These tools count lines of code and produce statistic on the number of lines in the source code, the lines of code, the lines of comments, the empty lines and so on.
Some also calculate the COCOMO 81 and COCOMO II estimates for the software being analysed. If you’re not familiar with it, the COCOMO model was developed by Barry W. Boehm to estimate the effort, cost and schedule for software projects. I wouldn’t rely on these numbers to plan a software project, but they’re an interesting tool to compare existing projects and get a feel for the size and scope of them.
Counting the lines of code in a software project sounds trivial and quite honestly seems like something you could do in a short bash command, i.e.:
% find . -name '*.go' | xargs wc -l | sort -nr
However if you want to do it accurately and fast, you can get into some interesting computer science challenges. And when it comes to scc, I mean blazingly fast!
But Why Count Lines Of Code?
TL/DR: It’s useful as a gauge of the size and complexity of a project, but if you want much more detail Ben Boyter, the author of scc wrote a blog post explaining why he put so much effort into building a tool to count lines of code.
The Challenge - Building A Tool To Count Lines Of Code
In this project, we’re going to build a tool to count likes of code in each file that is in a directory or subdirectories of that directory.
The tool should identify blank, comment and code lines and be able to provide a value per file and a summary of the whole project with output in text or JSON.
For example, here’s the output of SCC on the Go version of the Redis clone I use in my Build A Redis Server Master Systems Programming Through Practice course:
% scc
───────────────────────────────────────────────────────────────────────────────
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
Go 11 2229 202 23 2004 429
License 1 21 4 0 17 0
Markdown 1 2 0 0 2 0
YAML 1 25 6 0 19 0
───────────────────────────────────────────────────────────────────────────────
Total 14 2277 212 23 2042 429
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $57,168
Estimated Schedule Effort (organic) 4.64 months
Estimated People Required (organic) 1.10
───────────────────────────────────────────────────────────────────────────────
Processed 62447 bytes, 0.062 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
Interesting to see that it estimates the cost of building the software as 57k USD!
Step Zero
Like all C based programming languages we’re zero indexed at Coding Challenges! In this step you’re going to set your environment up ready to begin developing and testing your solution.
I’ll leave you to setup your IDE / editor of choice and programming language of choice. While you’re doing that give some thought to the programming language or languages that you might find it useful to be able count the lines of code for.
Step 1
In this step your goal is to scan a file system from a starting directory and identify all the source files in it.
Your program should accept a single argument for the starting directory. For this step I suggest you list out all the matching files. To check your code works you could pipe the output to a file and compare to the bash command:
% find . -name '*.go' | sort
Replacing go
with the extension of the programming language your going to focus on counting the code for.
Step 2
In this step your goal is to differentiate between blank and non-blank lines in each file. You should scan each file in the directory and it’s subdirectories then optionally either print a summary (the default option) or a breakdown per file.
For example:
─────────────────────────────────────────────────────────────
File Lines Blanks
─────────────────────────────────────────────────────────────
internal/parser/parser.go 463 78
~al/interpreter/interpreter.go 334 63
internal/resolver/resolver.go 248 41
Step 3
In this step your goal is to split the non-blank lines into code and comments. As a first step consider the cases where a the first no-whitespace character is the comment character as a comment.
If your language supports it (i.e. C, C++, Java, etc.) don’t forget support for multi-line comments.
Once you have that, check your output and produce an updated report, for example:
───────────────────────────────────────────────────────────────────────────────
File Lines Blanks Comments Code
───────────────────────────────────────────────────────────────────────────────
internal/parser/parser.go 463 78 1 384
~al/interpreter/interpreter.go 334 63 5 266
internal/resolver/resolver.go 248 41 0 207
Step 4
In this step your goal is to handle a line that may contain both code and the start of a multi-line comment, for example:
i++; /*
comment
*/
I would count this as one line of code and two lines of comment. The important thing is to recognise a multi-line comment beginning on the first line and then not counting the last two lines as code.
Step 5
In this step your goal is to handle plain text files and Markdown files. Plain text should be easy, just count blank and non-blank lines. For Markdown you should identify multi-line code sections and count the lines of code.
Step 6
In this step your goal is to handle one or more additional programming languages. Think carefully about how you do this. If you approach it the right way it’s possible to make adding a new programming language a simple matter of adding some config to a languages file. Hint, a table driven state machine!
Going Further
To take this further, configure support for multiple languages and make your project available for download. Be sure to provide instructions for users to add support for a programming language and open a pull-request on your repo.
Beyond that add support for calculating COCOMO for the code.
Help Others by Sharing Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know - ping me a message on the Discord Server, via Twitter or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo.
Get The Challenges By Email
If you would like to receive the coding challenges by email, you can subscribe to the weekly newsletter on SubStack here: