Let’s talk about how to format code and name things.
I remember when PSR-1 and PSR-2 became a thing. Jeez that “use only spaces” thing was just ridiculous, I knew that tabs where better (right?). And those crazy formatting rules were the opposite of what I was used to. Nonsense! One day I did the jump and after a few weeks, I just didn’t care anymore. I was over it, just like everyone else did, and PSR-2 was my new religion.
Fast forward a few years and I’m at a talk where the speaker argues to use snake case for naming test methods. Dude, you crazy? We just finally got consistent style across all the PHP world! And we all know that camelCase looks much better right? How can I go against PSR-2? It turns out, after trying it, that this was a wonderful idea…
Habits are sometimes making us blind. We think X looks prettier than Y but that’s just the habit speaking. In this article I’ll try to take a rational approach at coding style. That means leaving the “it looks ugly/better” at the door.
If at any point you feel like something “just doesn’t look good”, breath in, breath out, and try it! Nothing beats hands-on experience, not even some random article on the internet :)
Should we use trailing commas or not? If you are not familiar with the practice, here is an array using trailing commas:
1 2 3 4
As you can see, the last item ends with a comma even though there is no extra item after it. It’s perfectly valid PHP, the trailing comma is simply ignored. Now consider this example which is not using trailing commas:
1 2 3 4
Let’s add a new value to the array:
1 2 3 4 5
Here is the diff we generated in the commit:
1 2 3 4 5 6
With trailing commas, the diff looks like this instead:
1 2 3 4 5
As you can see, trailing commas lead to simpler and cleaner commits. This is also very useful when using
git blame: each line points to the real commit that added it. Conclusion: use trailing commas.
It’s interesting to note that there’s currently a RFC to allow trailing commas everywhere in PHP (not just arrays). Obviously, I think it would be a great addition to the language for the reasons explained above, as well as for the sake of consistency.
This practice is fairly common, though not universal. PSR-2 doesn’t state anything about it, but many projects enforce such rule. Here is an example with phpdoc:
1 2 3 4 5
And here is another one with arrays:
1 2 3 4 5
Or for assignments:
1 2 3
Anyone who has ever modified such code knows: it’s a pain. Sure it may “look good”, but when modifying that you have to fill in all the extra spaces to keep the alignment. One may argue that an IDE can re-arrange that for us, but even then let’s look at the diff when adding a new item to an array:
1 2 3 4 5 6 7 8 9
Now all the
git blame information is lost and the commit is almost unreadable.
Alternatively, if you don’t bother keeping the alignment:
1 2 3 4 5 6
The whole point of the alignement is lost, and there goes consistency.
Another example following an IDE refactoring (
LoggerInterface was renamed to
1 2 3 4 5
We have all seen that!
To sum up on aligning values:
- it requires extra work to maintain
- it messes up diffs and
- it leads to inconsistent alignment over time
Conclusion: do not align things with spaces.
Using phpdoc to document classes, functions, methods, etc. is good practice. However code considered as “well documented” usually looks like this (on GitHub):
1 2 3 4 5 6 7 8
There is much content in this docblock, but most of it is duplicated from what developers or tools can already get from the source:
- we get that it’s a constructor, the method is
- we get that
HttpKernelInterface, the parameter is type-hinted
- we get that a
HttpKernelInterfacetype-hint means that the parameter must be “An HttpKernelInterface instance” (this comment has no added value)
In reality here is what the docblock provides that isn’t provided by the code itself:
$cacheDiris a string (or null)
$cacheDiris null, the default cache directory will be used
The docblock could be reduced to this:
1 2 3 4 5
On top of being information overload, the duplication also becomes a problem when the code changes. Everbody has seen a docblock that doesn’t match the method it describes. When information is duplicated this is much more likely to happen.
All of this is becoming even more interesting with PHP 7 (coming with scalar type-hints and return types). Here is another example to illustrate that:
1 2 3 4 5 6 7 8 9 10 11
With PHP 7 the docblock would become entirely useless:
1 2 3 4 5 6
Conclusion: use docblocks only to add information.
The “Interface” suffix
Let’s take that example:
1 2 3 4 5
One would argue here that the fact that we ask for an interface explicitly (it’s visible through the name) is good: we know we ask for an interface. If the type-hint was for
Cache, we wouldn’t be sure whether we are asking for an interface or for an implementation.
My answer to that is that from the consumer’s point of view, it doesn’t matter if it’s an interface or an implementation. What matters is that
Foo needs a cache, end of story. There’s a principle behind interfaces: either you have one implementation and you don’t need an interface, either you have many implementations and you create an interface.
If for some reason there’s only one implementation of the cache, then fine: just give me an instance of the class! If however there are multiples implementation of the cache, then each implementation exists for a specific reason. And that should be visible in the name. There is no reason there would be one implementation named
Cache. There would be
In the end, when we use the name
Cache we either type-hint against the one implementation (no interface exists), or either against the interface. All is well!
I believe we have issues with this because :
- we are sometimes tempted to create interfaces when it’s not needed, just because interfaces sound like bonus points towards clean code (thus leading to
Foo implements FooInterface)
- it requires coming up with unique names for implementations, and naming things is hard
But even though this is challenging our habits, getting rid of
*Interface forces us to think: better names and no unneeded interfaces.
For example if
UserRepository is an interface, then you are forced to find a more specific name for your implementation. And you come up with
DoctrineUserRepository and you realize that there could as well be
InMemoryUserRepository. Interfaces makes much more sense when they are the default. Implementations are secondary.
Let’s keep in mind however that in some cases (for example in libraries/frameworks), interfaces are introduced only to allow a third party to replace the default implementation. We then have an interface with one implementation (other implementations are to be written by framework users). Given that context isn’t the same as what was described above, it’s harder to apply the same principles. In that case I can only advise to use critical thinking :)
The “Exception” suffix
What, again the suffixes? Yes!
this section is inspired from Kevlin’s mind-blowing talk
Here is where we use exceptions in PHP:
- when creating an exception class by extending another one
That means that everywhere an exception class will appear, we will know it’s an exception. Having the
Exception suffix is then completely redundant.
Let’s take an example:
UserNotFoundException. The suffix brings absolutely no value. Even worse, it makes the (perfectly valid) sentence more obscure:
UserNotFound is everything we need.
What’s even more interesting is that, in some cases, removing the
Exception suffix makes the name look quite bad. As an exercise, let’s remove the suffix from Symfony Form’s exceptions. By the way I’m taking examples from Symfony not because I believe it is bad code, but on the contrary because it is a very good code base (thus making the point stronger).
First let’s start with the exceptions that would make perfect sense without the suffix:
Those sound like perfectly valid english sentences that explain an error.
Now let’s look at those that sound more like an information than an exception or error:
Imagine that situation in real life: you want to get on the bus, but the driver tells you “no, somebody already got on”. You would probably ask “So what?”. The driver’s problem is maybe that the bus is full. It’s the same here: maybe a form cannot be re-bound if it has already been bound.
CannotSubmitAgain as exception names instead?
Lastly, let’s look at the exceptions left:
ErrorMappingException(notice the combo error + exception)
LogicException: exception/error in the logic
RuntimeException: exception/error at runtime
StringCastException: exception/error while casting to string?
LogicException which are very generic exceptions (maybe they deserve to be more specific?), here are suggestions for the other two:
Not convinced? Have a look at your own code and do the same exercise, it might give you a new persepective on your errors.
Exception suffix is unnecessary. Not using it also has the benefit of forcing us to come up with names that better describe an actual error.
In the end we only had a look at 5 actual examples, but I want to stress that the main point of this article is:
- it’s possible to think about coding style logically
- sometimes doing so forces us to challenge our habits
- when unsure or dubious: just try
If you need to vent off on how some of this is stupid and ugly, there’s a comment box below. I would also be happy to hear about practices you tried and you liked!