Why Data Transfer Project (DTP) is the future of data portability between online services
Data Transfer Project (DTP) is the open source data portability standard that tries to solve one of the big problems when we want to migrate data. We refer to the Data Lock-In, the impossibility of obtaining all the data that we have stored in one service to go to another. Recently, the project supported by Google, Facebook, Microsoft, Twitter, presented the specification and architecture that will allow reaching a common API for the transfer of user data between services. The key is to be able to define a direct communication without intermediaries, and most importantly: a common data format.
Is the idea goes back to 2007 when it emerged, Data Liberation Front , a group of Google engineers who have since promoted the release of data to avoid this kind of problems with the services offered by the company. Thanks to them we have Google Takeout, a tool for users that allows you to back up your files in all Google services.
“Users should be able to control the data they store in any of Google’s products. Our team’s goal is to make it easier to move data in and out.”
Now the project in hand is even more ambitious and aims to create a common standard capable of being used to convert any proprietary API to a set of common formats by creating an ecosystem of tools around it.
The future of Service-to-Service portability
The goal is that users should be able to transfer their data from one service to another, without the need to rely on an intermediary (another SaaS, for example) or have to download and resend all the data again. There are many services that base their model on migrating emails, contacts, etc. from one provider to another. For example, when you switch from Yahoo to Gmail.
To facilitate the complicated world of data portability, Data Transfer Project defines a series of adapters to transform proprietary formats to a series of Data Models common. This allows the data transfer can be made from the service provider to another maintaining control over the security and integrity of the data.
Obviously, these Data Models are limited in terms of the information that can be shared from one service to another, since it is impossible to model all the information but with the different use cases already created as photos or contacts the expected functionalities are widely covered. Still have to continue using some mechanism to incorporate certain more specific metadata.
How it works? The Data Transfer Project architecture
There are 3 fundamental components described in the Data Transfer Project white paper …
- Data Models that provide a common format for transferring data.
- Adapters that facilitate the methods to convert each proprietary data and enable authentication within the system.
- Task Management that perform the background tasks to execute the data migration.
Through the Data Models we can define the data format that we want to exchange. Currently you can check the existing ones in the GitHub of the project: emails, contacts, photos, calendars, tasks, etc.
With the Adapters we can make the translation between the data given from a specific provider (the service API) to the DTP Data Models of exchange. In addition to the transformation of data, there is another type of adapters that deal with the authentication of data. OAuth has been the most widespread provider, although DTP is agnostic in the type of authentication.
Finally, the set of tasks necessary to perform the load of the portability of data resides in the other great component of Tasks. They are responsible for making calls between the adapters, perform the logic of retries, manage the rate limits, paging, individual notifications, etc.
How can we test and participate the developers?
DTP is still in development and is not ready to be used in production. However, there are different use cases that we can try migrating data between some platforms. Available in the project GitHub.
All the code is written in Java and can be displayed via Docker. Some of its decoupled modules can be written in Python, C # /. NET, etc. The cloud platform that is being used to test it is available for both Google Cloud, Microsoft Azure, Amazon AWS.
Within the published repository we have the possibility to add new Providers using the set of interfaces described in GitHub. It is necessary to complete each of the parts described in the DTP architecture: Data Adapter, Auth Adapter and, potentially, a new Data Model.
You can also contribute to the Core Framework, which involves some important pieces such as the Host platforms, the storage system.
Average Rating