DTOs in CakePHP

Data Transfer Objects in PHP have been around for some time.
Martin Fowler proposed something here in his architecture book and blog posts back in 2004 even.
Also, this article from 2010 describes the idea.
They are also heavily used in other frameworks – so the performance overhead in terms of setting and getting them seems not to be too relevant compared to the benefits they serve on larger applications.

They are not used yet too much in CakePHP, for several reasons.
For one, the entities seem to cover most of the primary use case: the mapping from database rows into an object to work inside the application.
They also can be somewhat used to store form data and other transient information along with the actual database fields.
For most other cases people tend to use unstructured associative arrays, which are sure fast and can also easily contain nested data.

The problems now with this become clear when using these arrays a lot:

  • Always having to debug()/dd() the returned array to know what data is available (keys, nested info, …)
  • Refactoring is a painful task, as your IDE cannot report "usage" on the keys, and so most likely you will forget half of it (who has 99+% test coverage?).
  • At any time a key/value could be missing without knowing it, running into unexpected behavior
  • No type-safety, with PHP 7.1+ type-hinting now in vendor and project code a time-bomb to go off.

DTOs for better code quality

The DTOs now can help to put your unstructured arrays into a clean structure:

  • You clearly see what fields are available, and what type that field is (using IDE you get fully type-hinting/autocomplete to be super quick here).
  • You get a clear exception/error on fields that are either unexpected or missing where you want this kind of information – early on instead of somewhere down the line.
  • Refactoring is now a piece of Cake: Shift+F6 e.g. in PHPStorm finds all occurrences with 100% accuracy (no false positives like with regex on the string keys) and does a simple rename with one click, even if hundreds of src/ or test/ files are involved.
  • Snippets/PRs are easier to review as the methods now expose a more clean API to be used: get() as nullable vs getOrFail() as non-nullable return
  • Analyzer tools like PHPStan/Psalm can now more easily help you find bugs and errors in code, as they now know what the data contains and what can be used in what scope.

The plugin in CakePHP for this is Dto, a generation tool for your custom DTOs.
It allows a very quick definition and generation of a DTO so that you can focus on using them, rather than writing or updating tons of classes.

Main use cases

It is important to state that those are not meant to replace your entities! Even though they do have some advantages over them in quite a few cases, they are more meant as an additional layer of useful objects where the entities are not meant to be used.
And this is for example mainly for mapping external (API) data into internal data structures that are speaking – for both humans and tooling.

They can also be useful when you have a custom aggregation of data that goes beyond a specific entity.

Definitions as XML

By default, XML is used for defining the DTOs. This has several advantages:

  • Very quick to write out, let the generator do the rest. No need to code basic getter/setter logic for tons of fields.
  • Very readable, also very easy to adjust or extend.
  • Validation using XSD makes this also report issues early on.
  • You can focus on defining and directly using DTOs, which makes you super-productive. Rapid development on steroids.

A basic DTO group could be defined as this:

<dto name="Car">
    <field name="color" type="string" />
    <field name="attributes" type="string[]" />
    <field name="isNew" type="bool" />
    <field name="distanceTravelled" type="int" />
    <field name="value" type="float" />
    <field name="manufactured" type="\Cake\I18n\FrozenDate" />
    <field name="owner" type="Owner"/>
</dto>

<dto name="Cars">
    <field name="cars" type="Car[]" collection="true" singular="car" />
</dto>
    
<dto name="Owner">
    <field name="name" type="string"/>
    <field name="birthYear" type="int"/>
</dto>

An additional advantage is the DRY (Don’t repeat yourself) and speaking way of defining default values, required fields, deprecations etc.

Practical use cases

One of the use cases I leverage DTOs most in is to map chaotic Jira/Github API data (JSON => array) into internal data (array => nested speaking objects).

See sandbox/dto-examples for some live demo of these examples. Check the source code on how readable and easy it is to work with this in IDEs compared to manual arrays with unclear keys and nested data.

Instead of debugging in the view for what fields in what nested level are available, you can now use auto-complete:

Before:

<?php echo h($pullRequest['head']['ref']) . ':' . h($pullRequest['head']['sha']); ?>
by <?php echo h($pullRequest['head']['user']['login']) ?>

After:

<?php echo h($pullRequestDto->getHead()->getRef()) . ':' . h($pullRequestDto->getHead()->getSha()); ?>
by <?php echo h($pullRequestDto->getHead()->getUser()->getLogin()) ?>

Typehinting also lets the IDE and you know what fields are definitely set (required as in not nullable) and can directly be used.
For others (nullable), better to use get...OrFail() for chaining.

So let’s imagine a case where this becomes even more clear. Incoming API data with some required and some optional fields:

Before:

<div>
    <?php echo nl2br(h($apiResult['description'])); ?>
</div>
<div>
<?php if (!empty($apiResult['modified'])) {
    echo $this->Time->nice($apiResult['modified']);
} ?>
 by <?php if (!empty($apiResult['username'])) {
    echo h($apiResult['username']);
} else {
    echo h(__('anonymous'));
} ?>
</div>

After:

<div>
    <?php echo nl2br(h($apiDto->getDescriptionOrFail())); ?>
</div>
<div>
<?php if ($apiDto->hasModified()) {
    echo $this->Time->nice($apiApi->getModified());
} ?>
 by <?php echo h($apiDto->getUsername() ?: __('anonymous')); ?>
</div>

If you need to, you can check on optional fields using "has" methods, otherwise, you can directly use the speaking nullable or non-nullable getters.

Inflection usage

Another important feature is the usage with different incoming formats (inflections) for fields.
From forms or DB you usually get them as underscored field_name. URL query strings often pass them as dashed field-name.
And options arrays often pass them as camelBacked fieldName.

With the generated DTOs you can normalize them all into the structure:

$myDto = new MyDto();

// manual assignment
$myDto->setMinValue($this->request->getQuery('min-value'));
...

// or a bit shorter for multiple query strings:
$myDto->fromArray($this->request->getQuery(), true, MyDto::TYPE_DASHED);

In a similar way you could fetch the fields of POST/DB data and then use them more speaking:

$article = new ArticleDto();
$article->fromArray($articleEntity->toArray(), false, $article::TYPE_UNDERSCORED);

Associative collections

That is a neat feature that also comes out of the box with generated DTOs.
The attribute associative="true" will automatically assume a basic collection that focuses on keys rather than values.

Let’s imagine we want to map color codes to a human name inside some MapDto:

<dto name="Map">
    <field name="colors" type="string[]" singular="color" associative="true" />
</dto>

Now we have speaking methods to easily work with this:

// Example for adding associated array items
$mapDto->addColor('#EEEEEE', 'gray');
$mapDto->addColor('#FFFFFF', 'white');
  
// Example for setting associated array
$mapDto->setColors([
    '#000000' => 'black',
    '#0000FF' => 'blue', 
]);

// Example for getting associated items
$mapDto->getColor('#EEEEEE'); // returns the associated value to this color
$mapDto->getColor('#123456'); // throws exception because it doesn't exists

// Example for checking associated items
$mapDto->hasColor('#EEEEEE'); // returns true
$mapDto->hasColor('#123456'); // returns false

The getColor() getter will be type-hinting with the type string here even.

Immutability?

The DTOs can be generated as immutable objects, similar to "value objects" (Money, DateTime, …).
Even though the default should be mutable, there can be use cases where you do not want to partially or iteratively fill the object. In these cases it can be cleaner to make sure, they are completely filled from the start using the constructor and from there on not unexpectedly modified further.

If you provide "required" (true) for your fields that should be there from the start, this is somewhat extra protection that might be a bit overkill though.
Totally up to you. The plugin provides both ways.

Mapping in complex cases

When the incoming data you want to map doesn’t fully fit the DTO structure you want to have, a mapper layer can help with the DTO filling.
See this article for details.

In my case I had to use some basic mapping for custom Jira fields:

    /**
     * @param string $issue
     *
     * @return \App\Dto\Jira\IssueDto|null
     */
    public function getIssue(string $issue): ?IssueDto {
        $result = $this->getJiraResult($issue);
        if (!$result) {
            return null;
        }

        $result['status'] = $result['fields']['status']['name']);
        $result['priority'] = $result['fields']['priority']['name'];
        $result['summary'] = $result['fields']['summary'];
        $issueDto = IssueDto::create($result, true);

        $version = array_shift($result['fields']['customfield2']);
        if ($version) {
            $issueDto->setVersion($version['name']);
        }

        return $issueDto;
    }

Performance

As mentioned in the beginning, the array to object mapping can reduce dispatching performance, depending on how heavily used.
You need to find out if the advantages outweigh the disadvantages. In many cases, they most likely will.
Reliability and coding speed (= money since developers are usually the most expensive resource) on the one side vs. a bit of speed decrease on the other.

Also, don’t forget the ease of adjusting existing code (e.g. using IDE refactor) this way, and the static analyzing tooling being able to verify more correctly after each change.
Adding, renaming and removing fields (or soft-deprecating first) can be done with great speed even on the largest code bases.

Outlook

We definitely need to start battle-testing it more to proof the readiness for larger applications.
Test it and check how valuable it is for you over normal array structures, how much more productive you are.
Reach out with your findings, and we can collect the feedback for others in the wiki.

There are quite some things to be done, and some ideas on the table to be implemented. For example, it could make sense to build a generic PHP solution here and making it more framework agnostic.
Check out the TODOS, open issues and alike and feel free to help out.

Happy baking!

Update 2024 March

The DTO Plugin got a schema generator added. You can now feed it with a an actual API result as input – or the XSD schema equivalent – and get your DTOs within seconds.
No need to manually write them anymore this way, even with multiple nested levels.

See the demo.

2 Comments

  1. I like this! I do a lot of "old school" back-end CLI integrations that rely on data specs with existing XML Schemas, usually in the form of DTD’s, e.g. cXML.

    I always validate XML against the schemas when first getting or as last step when setting (e.g., using seromenho/XmlValidator)

    This plug-in would allow for much better control throughout the process.

    I guess I’ll have to write a custom DTD => DTO engine, though!

  2. Interesting. Yeah, you could try to build on top of this, and have such a translation tool provide the required dto.xml files.
    Let me know if you happen to have some alpha release here or so.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.