Sam, a storage administrator, manages a private cloud system with data centers in Germany and the U.S.A. For the past few months, he has observed that data movement between these data centers is consistently slow. Customers who use this system are furious about the delay. Since most customers are in America, he has been ordered to move all the data from Germany to the U.S.A and shut down (or decommission) the German data center.
How can Sam move all the data safely?
Methods used
User interviews
Affinity mapping
User flow explorations
Rapid prototyping
Usability testing
SUS score
Understanding the problem
To understand the complete picture, I interviewed eight storage administrators who handle data centers of various sizes. Furthermore, I spoke to subject matter experts to dissect data movement policies and derive relationships between servers, data centers, and the cloud system.
These are some quotes and high-level themes derived from user interviews.
Exploring user flows to solve the problem
With the user and business goals in mind, I collaborated with subject matter experts to explore nine user flows to solve the problem. The decommissioning process was being developed by multiple scrum teams focusing on various parts of the cloud system. These user flows helped all stakeholders align towards the same goal by presenting an end-to-end flow. Furthermore, we converged to one user flow based on cost-benefit analysis, technical feasibility, and time to release the product. The final user flow had the following six steps:
Select the data center to be decommissioned
View servers of the data center to be decommissioned and details of other data centers in the cloud system
Revise the active data management policy to remove any dependencies on the site to be decommissioned
Remove any other references on the site to be decommissioned from other data management rules
Resolve any server conflicts and references in high-availability groups
Monitor decommissioning
Aligning with user expectations
After choosing a feasible user flow based on user expectations, I started designing interfaces that align with the aforementioned high-level themes.
Guidance
Included guardrailsto prevent the user from starting a futile decommission
Evaluated multiple explorations to summarize and guide the user if the data from the decommissioned site cannot be accommodated among the other centers.
Control
Identified data management rules that refer to the site to be decommissioned. These rules will be edited by the user. Provided suggestions to create a future-proof policy.
Ensured that the user removes unused rules and policies within the flow to prevent any confusion.
Flexibility
Recognized servers in other data centers that are either disconnected or administratively down. The user can fix these issues at any point and come back to the flow to continue the process.
Transparency
The user is shown a visualization that depicts data moving out of the site being decommissioned and the increase in data in other data centers.
The decommissioning progress can be optionally seen both at a server level and at a system level to help the user dive into granular details.
Testing and iteration
Tested the user flow with customers by providing a prototype connected to a dummy private cloud infrastructure. Iterated quickly based on user feedback to include a breakdown of database update progress. Overall, users felt that the flow is intuitive and this is reflected in qualitative user research and the average SUS score of 85.
Final thoughts
Although storage admins were pretty nervous about losing data during decommissioning, they regained their confidence after using the aforementioned user flow. This is primarily because the user flow dissected every part of a complex procedure by progressively disclosing bite-sized information.
My favorite part of this project was the process of exploring multiple user flows. We leveraged these flow charts to align stakeholders across multiple teams towards the same goal.