The phone call from an old friend
On a summer morning in April 2020, I received a call from a close friend named Tran Lam. After a couple of questions about each other’s situation and discussing the Covid epidemic, he asked me for advice on upgrading the company’s software system.
It’s the system he’s been building 15 years ago. With the vision of a young entrepreneur at that time, he set out to adopt new technologies to build his business from the ground up. Until 2010, he owned an internal infrastructure (on-premise) with dozens of servers, operating 7 – 8 business services concurrently. The successful system construction yielded positive commercial outcomes, leading to the company’s expansion to over 500 employees.
Recently, he feels that the business growth has been slowed down. He believes the pandemic was not to blame; instead, there were more opportunities. The problem was that the updating and creating new products did not keep up with the new requirements and goals of the business side, although top technical experts spent a lot of effort. He said that with any new update, the technical team was terrified since they didn’t know where to begin in the system mess; or whether the changes would influence and disrupt the system. The system is complex, and the team who created it had changed over time; some had gone, and others had moved on to other work, so they no longer fully comprehend it.
Each newly added feature continued to create new levels of complexity that no one in his company can fully grasp the extent of its impact. Nevertheless, the numbers of increasing traffic over time led to overloading the system and causing more problems than usual. These matters could not be solved by the previous solution, such as upgrading servers and extending bandwidth. The first reason is that the system was much larger and more complex than it had been for five years. Secondly, the software system was built based on the traditional architecture – monolith, which is optimal for small systems. The large and complex one became much more costly and complicated when combining many servers to improve the performance than other solutions. The technical team spent lots of time debating about continuing or re-architecting the system. Usually, the group decided to keep using it because of lacking time to build a new one. However, the business growth was not as good as it had been.
The system that used to be the rocket that pulled the business up in the past, now it turned into a poorly outdated vehicle engine. Now he is planning to start a new business: a digital ticket platform. He wants to take this opportunity to find a new technology solution for not only that application but also other digital services in the future. Talking over the phone was not enough; we determine this is a big Digital Transformation Problem, not just a particular technical issue. A morning exchange between him and me at his company took place in the next day.
Digital transformation, from theory to practice
The digital transformation is ‘pretty’ fascinating in theory. It is even more interesting in the practical. I think it might be helpful for other companies to find right approaches for transforming their digital system based on these kind of practical stories. Therefore, I’d like to share more details below.
Digital transformation is not a matter of moving from paper-based documents to digital ones on the server, allowing searching, sorting, and editing by software. That’s just the digitization process. The process of developing new business models based on digital technologies is called digitalization. Many companies have reached to the step of digitalization, but not yet digital transformation. For an enterprise that already has a digital system and can exploit it, like Mr. Lam’s, digital transformation is transitioning from an old system to a modern one to achieve better business efficiency.
I suggested him to transform the current monolith architecture into a microservices one, apply a scaling strategy for highly loaded services, and bring the system to the Cloud. He used to think about it, but some barriers were preventing him from executing this approach:
- Firstly, the system is too large. Team doesn’t know what’s best solution of migrating this “big monster” to cloud.
- Secondly, it is about the time issue. Spending 12 months or even longer to transform the entire system without knowing for sure whether a new one would work well or not is too risky.
- Thirdly, it is the ability to control technology and apply it to a specific business after the transformation done. Technology today is broader and more complex than in the past. His team needs time to learn how to handle while new business requirements are constantly being required. Recruiting experts is not easy, need to be fluent in software architecture, platform components and infrastructure, proficient in DevOps and Cloud, and more. In addition, a new expert is also required to comprehend the company business to understand the current system and optimize it.
- The last one is the cost issue. Cost of running a system on the Cloud is often lower than on-premises, but not save much and the difference is not worthy if you didn’t have a good architecture and organize well the system elements. And again, you still need experts on each type of Cloud you want to use.
After that meeting, we worked directly with his team to resolve each issues. Thanks to the support of the BA and QA teams, I have a good grasp of the business, see the performance analysis, and visually see the services and functions are not qualified. Through the leaders, I can see more clearly the architecture, the operation that links the components from the infrastructure to the application.
After catching the disease, come to prescribe. The “tricks” are released.
For the first 3 problems, I suggested the team to split the big monolith system into some smaller ones that’s easier for team to handle in short time. They must be well-aligned in simple manner and not be over-engineering. These smaller systems could be considered “Marco-service” (not really “Micro-service” because they are not too small – too many small things is not really good for managing). Team was requested to split only most resource consumption service firstly. We should avoiding the greedy direction of converting a series of services simultaneously, after first successful splitting, we split more but no more than three services in one Sprint.
The next is to pack those services into containers (which is a form of “isolation” of the operating system-level service environment) instead of in a virtual server (VM – “isolated” the service environment both at both operating system and hardware). That saves resource usage when scaling, no need to scale entire VM resources but only the resource really need for the service, and that ensures that it can be portable to other infrastructure easily.
Then there is the issue of internal communication between the systems services. It is always a “nightmare” in large companies because of the system’s overlapping constraints. For example, the order processing service need to communicates with the warehouse and delivery service. Sometime, the delivery system needs to communicate back to the warehouse service, when there is a return or exchange. In contrast, the warehouse service needs to notify the customer-side searching service. All these communications were handled by the standard REST API – a synchronous point-to-point communication that requires the sender to have the receiver’s information and directly send the information to that address. In synchronous communication, there is a possibility of a loss of synchronization when one of the two parties sending and receiving the information has a problem or changes, leading to data loss in the communication process. Data loss could disrupt customer experience like customers do not receive their orders because the system lost their payment data. This is the main reason for limiting the scalability of the system. To deal with such complexity in synchronous communication, I suggested team switching to asynchronous communication using Kafka’s Message Broker.
This type of communication allows the parties to authorize the transfer of information to an intermediary called the Message Broker, helping to decouple the sender and the receiver, no longer have to care about each other’s destination. Then data loss is no longer a matter. Moreover, these parties now can independently develop new operations without interfering with or modifying old ones.
The next step is handling data, especially Database/Caching. I asked team to limit the access frequency to the Database as much as possible. For businesses that do not require the latest data or data that is not too important, caching should be utilized most. For example, it is very inefficient if the system retrieves FAQ (frequently asked questions) data from the Database. Less changeable content like this should be on CDN.
The final step is to install system performance monitoring. After each change, we should measure and evaluate until we got the satisfied result
During transformation, it is still a matter of time to implement and deploy the system as designed, 2-3 months is the shorted time team can commit to deploy the first version. It is related to the second problem mentioned by Mr. Lam, he always wondered because if the implementation time were too long, it would interrupt business goals, miss opportunities, and incur costs. To build a Kafka Broker for learning it, it just takes less than 1 day. But to build a reliable Kafka Message Broker Cluster, it requires a lot of time and effort, the quality result is still in doubt. So we have Cloud and Managed Service (the service were built, optimized and maintained by 3rd party) in this case. For now, almost good quality Managed Services are provided by International provider, with the cost is not suitable for Vietnamese market. For example, it may takes more than one billion VND for Kafka service by Confluent for the 50Mbps average traffic. In the public Cloud, they have everything, the quality is maybe not best as 3rd party Managed Service provider, but the cost is more acceptable. So even the cost for infrastructure and experts to operate is not really cheap, I still suggest Mr Lam to move the important services to Cloud, to speed up the transformation process.
Regarding to the cost, although there are VM and LB infrastructure on Cloud that’s we can lease with cheaper price than building them on-premises, we can still optimize more the cost of running applications through a platform on top of the infrastructure. And we can bring a better application quality by integrating efficient services on that platform. Not need to integrate too many! In most of software projects, we often use not more 10 important services to achieve the significant difference in the performance, for example: cache, queue, container etc. But in the mean time of transformation, I cannot help Mr Lam building that platform. It would be a super big project that requires many expert to work together, meanwhile Mr Lam needs the team to focus more into the business requirements instead of investing too much time and cost on building that huge platform.
The result and new vision
Until now, Mr. Tran Lam has been able to rest assured with the new system and continue to focus on the business problem. He constantly informed me about new projects more often than ever before. He talks more about business news instead of technical worries like before.
From 2020 to 2021, through the role of technical-solution director, I continue to consult or directly participate in successful digital transformation for many businesses. For example, transforming micro-services architecture for Vietnamese startup Logivan; e-commerce and services management for Australia’s Wilson Security; Digital to Consumer (D2C) e-commerce system for the Asia Pacific region of Electrolux Group.
I realized that, rather than working one-on-one with every business, if it were possible to incorporate other technical experts’ and my own experiences into one platform, it would be able to assist more businesses and accelerate and reduce the cost of the process. I though I still owned Mr Lam this platform. From his problems, I keep thinking about a platform that integrates important managed services, instead of too many and not centralized, focus into Application quality (instead of Infrastructure only), be optimized for best performance and cost; that would be really helpful for businesses like Mr Lam’s.
How Sun Cloud started
The case of Tran Lam is just one of the hundreds of businesses’ stories undergoing digital transformation. With the experience of consulting digital transformation strategies for businesses, I realize that platform is the key to digital transformation and the next step after infrastructure transformation. Therefore, in 2021, my colleagues and I started to build the Sun Cloud platform.
With the philosophy of bringing the best quality products and services at competitive prices, Sun Cloud is a business companion in the digital transformation journey.
Problems that Sun Cloud resolves
– Providing most essential tools optimized in one stack, operated, and managed on the Cloud by Sunteco’s experts. For example, application life-cycle management, scalable and distributed databases, in-memory cache, and log systems.
– Providing a ready-to-use environment that already packs software-hardware solutions, allowing customers to transform their application to the container-based microservices architecture quickly. More significantly, Sun Cloud enables customer to fully utilize the benefits of micro-services and container architecture, such as rapid responsiveness under high loads, zero downtime, dispersing the system across multiple locations, and so on.
– Saving time and resources spent on configuration management and infrastructure operation, even in the infrastructure on the Cloud. So that customers can focus on developing the application layer, making it easier and faster to deploy business applications and services.
– Prioritize high recoverability, high availability, and service resilience after failures. Roll-back and roll-forward allow customers to go back in any time in the past or go straight to a new version easily without data loss.
– Through the Sun Spinner service, it optimizes the elasticity and load-carrying capacity of the applications and services to ensure the best customer experience.
– Through the Sun Highway service, it offers a high-performance, asynchronous communication channel for applications and services with the capacity to store communication history. It also features real-time monitoring with efficient visualization.
– Providing a channel for team collaboration and role base management through a user-friendly interface for both technical and non-technical users
– Local supported by technical experts, not only for the infrastructure but also the application architecture transformation
Sign up now to receive $100 into your account: https://dashboard.sunteco.vn/
To offer the best solutions for your problems, the Sunteco team and I always look forward to listening to your voice. Feel free to contact us via:
Phone: +84 24 5678 3868
Chau Nguyen (Charles)
Chief Technology Officer
CTO Chau Nguyen started his high-tech development career in Singapore in 2007. In 2015 he started building Cloud technologies and platform for US market then consulted technology transformation solutions for various companies, including famous brands as logistic startup Logivan in their micro-service re-architecting, Mobifone carrier in MobiEdu education platform scaling up or Electrolux in E-commerce global scaling solution.