Skip to main content

Leopard

Project Description

Leopard is a IBM Research effort to address the complexity of modern data centers that are comprised of a very large number of heterogeneous physical and virtual systems hosting middle-ware and applications. To manage everything in a data center, system administrators usually need a plethora of management tools since most tools are designed to either manage a type of device or are domain specific. The boundaries between the different management tools limit productivity of system administrators on their daily tasks as each tool only offers a partial view of the entire managed environment.

Rapid advances in data center virtualization technologies are enabling a new holistic vision of the data center for the future, combining server, storage and network virtualization. Leopard project drives towards development of a single, integrated management solution for monitoring and controlling servers, storage, IP networks, and virtualization. Past user studies on system administration have shown the need for better support for integrating information about multiple system components, and Leopard thrives towards accomplishing it.

SPARK: Virtualization Management

Modern enterprise data centers are complex distributed environments that are built incrementally over time to satisfy application requirements. Most often configuration decisions, like which host to choose to run an application or which storage to use for application data, are predominantly dictated by the application requirements at the time of its provisioning. However, as application needs change over time, it becomes a challenging task for administrators to optimize the data center configuration to fit these new requirements. Such optimizations may require migrating an application to a new server, migrating its storage to a different controller or re-assigning I/O paths between server and storage.

Traditionally such reconfigurations have been extremely disruptive as both applications and their data are tightly coupled with the physical resources. Making any change at the physical resource level meant shutting down or at least quiescing the application and restarting it after reconfiguration. This discourages administrators from performing any reconfiguration optimization's and eventually leads to over-provisioning that further worsens with time. As an example, in order to accommodate backups within its time window, the administrator might be forced to over provision the number of ports to the backup application, even though the backup runs only for a few hours at night.

With fast growing acceptance of virtualization technologies, it is now becoming possible to perform such reconfiguration optimization's with zero or minimal disruption to applications. Server virtualization technologies like VMWare, Xen, IBM POWER Virtualization allow applications to be hosted inside "virtual" machines, which can be independently mapped to any compatible physical machine and live migration technologies like VMWare VMotion or Xen Live-Migration allow applications to be migrated from one physical host to another with zero downtime. Similarly storage virtualization technologies like IBM SAN Volume Controller (SVC) virtualize storage into virtual disks, called vdisks, and can live migrate vdisks from one physical storage controller to another, without any disruption in storage access. There are similar network virtualization technologies from Cisco and other networking vendors.

The challenge in data center optimization now is to automatically perform (or recommend) such optimization's in an integrated (server, storage and network) manner for power, performance and load-balancing reasons. Most of the competition today focuses only on a single layer of optimization's -- for example, VMWare Dynamic Resource Scheduler (DRS) only manages virtual machines based on server reconfigurations and thus, cannot plan or orchestrate any storage or network reconfiguration.

In contrast, our virtualization solution is aimed at performing integrated data center optimization's by leveraging individual layer virtualization advancements. Our integrated optimization engine, based on a novel combination of Multi-dimensional Knapsacks and Stable-Marriage algorithms, obtains a global end-to-end view of the data center and performance characteristics of each component to dynamically decide the right optimization for the environment -- a virtual machine migration or virtual storage migration or I/O path re-assignment through virtual networks. For the backup example above, SPARK can dynamically change the number of ports available to the backup application and/or shift load from the storage controllers involved in backup. This integrated orchestration combined with innovative optimization plan selection differentiate SPARK from any competitive technology that will be available in the near future.

iCharge

Most organizations are becoming increasingly reliant on IT product and services to manage their daily operations. The total cost of ownership (TCO), which includes the hardware and software purchase cost, management cost, etc., has increased significantly. CIOs have been struggling to justify the increased costs and at the same time fulfill the IT needs of their organizations. For businesses to be successful, these costs need to be carefully accounted and attributed to specific processes or user groups/departments responsible for the consumption of IT resources. This process is called IT chargeback and although desirable, is hard to implement because of the increased consolidation of IT resources via technologies like virtualization. On one hand, virtualization aims to hide the complexity of underlying heterogeneous resources, to increase overall utilization, to make systems more dependable and to simplify system management tasks; but on the other hand, it adds another level of indirection because of which it becomes difficult to quantify the usages of individual applications and departments and their corresponding costs. Current IT chargeback methods are either too complex or too adhoc. There are no well-defined cost models for virtual resources provisioning, migration of virtual resources and other aspects related to virtualization.

ILM SAGE

Leopard's SAGE tool simplifies information life cycle management (ILM) for both cloud and traditional enterprise storage. ILM itself consists of two main tasks; calculating the value of data and moving data to the appropriate storage tier based on its calculated value. Since the value of data changes over time, manual valuation of a large data set is a never-ending, time-consuming process. Using SAGE, data valuation is simplified through the use of customer created policies. Integrated with both GPFS & SVC, SAGE allows customers to create file & volume based policies through a simple user interface. In addition, a policy assessment tool is provided that visually displays & calculates the effects of policies being considered to help customers make well informed ILM decisions. A recommendation engine is also provided that reports patterns detected that may be of interest to the customer such as spikes in load or cost & gives suggestions on potential solutions. The second major action of ILM focuses on the movement of data. In SAGE, data scheduling, placement, deletion, & migration is completely automated once the policies have been created by the customer. Migrations, in particular, are done live without disruption to applications & are scheduled intelligently to avoid overloading devices. SAGE provides administrators an interface through the Leopard management product that displays the load ILM is and has been placing on the cloud as well as the status of currently running migrations. In summary, SAGE provides a simple & intuitive data valuation mechanism for customers who know the value of their data best & a fully automated policy execution engine that reduces administrator workload.

Resiliency Management

The field of disaster recovery (DR) has gained a lot of prominence after the 9/11 terrorist attacks. Businesses want redundancy in people, processes and information technology (IT) infrastructure in order to address machine failures, site failures, virus attacks, city-wide shutdowns etc. Thus, a comprehensive disaster recovery solution needs to address all of the above-mentioned issues. In order to address disaster recovery at the IT level, one has to provide redundancy at a combination of application, server, network, and storage level. Furthermore, redundancy at the storage level can be provided at a combination of database, file system or block storage levels. The focus of this project is on providing automated disaster recovery planning at the storage level. The goal of the project is to combine the diverse storage DR planning technologies and best practices into an integrated framework for storage and systems management.

IBM Systems Management Solution needs to provide end to end disaster recovery solution that spans across applications, networks, servers and storage. Customers typically ask for storage disaster recovery support at the file system or database system or block storage level granularity due to the performance/functional trade-offs associated with these different alternatives. However, since the files in a file system map to an underlying block storage volume, and the tables in a database container also map to either files or an underlying block storage volume, typically disaster recovery support is provided at the block storage volume level. Our effort is toward focusing on the block storage level disaster recovery support for open systems.

The planning for application-level Disaster Recovery (DR) encompasses virtual machines, servers, network, and storage subsystems associated with the application. Customers typically spend a significant amount of resources in hiring a team of consultants with expertise in individual layers of VM, server, storage. We are working toward a CIM-complaint framework that discovers the end-to-end resources associated with an application. Administrators can use a simple yet comprehensive interface to specify DR requirements in terms of failure type (Virus, mis-configuration, Subsystem, Site, etc.), RPO, RTO, Application Impact. The output is a collection of plans with consistency group details of primary and secondary devices along with the replication protocol. The management solution generates human-like real-world cascaded plans with one or more replication technologies used in combination at different layers e.g., VMware snapshot technology combined with PPRC at the storage level.

Leopard User Interface

Leopards user interface is centered on a Topology Viewer, which provides a visual representation of the end-to-end system structure. The Topology Viewer is based on a successful early version that appeared in TPC 3.1 and 3.3. The Leopard Topology Viewer adds to the previous version an improved end-to-end system view and more system components.

The idea of integrated system management has spread across the industry, as increasingly complex systems spur the need to bring together information from different system components to monitor and manage these systems. Very little attention has been given, however, to integrating the work of system administrators, who usually manage these complex systems as a team. Leopard integrates collaborative features with the data model and user interface to help system administrators communicate with each other and coordinate their activities.

Annotations

A feature commonly found on websites, annotations (also known as tags) would permit any system entity to be associated with any number of short phrases or single words, enabling administrators to overlay semantic structure on system configuration. This structure would both help individual administrators remember and understand the roles of different components, and would also help new administrators learning about a system. Furthermore, it would enhance search mechanisms, helping administrators find components within the system.

Collaborative Discussions

Though the idea of threaded discussions is an internet feature almost twice as old as the world wide web, the benefit this powerful feature could provide to the field of system administration has not yet been explored.

Shared System Views

When system administrators work together on problem determination or other collaborative tasks, they could benefit from the ability to share views of system state in real time.

Task-centered workspaces

System administrators often work on multiple tasks at one time. On top of constantly monitoring system health, many tasks they do involving waiting (e.g., for a storage subsystem to migrate) or occur over relatively long periods of time (e.g., troubleshooting slow server response times). A user interface organized to enable quick and easy task switching by preserving task-specific views into the system could provide enormous benefit for admins who multitask.

Customization

In field studies of system administrators, we learned that they often create custom scripts and monitoring tools in order to effectively understand and manage their complex systems.

Interactive charting and monitoring

Situational awareness is crucial for system administrators in ensuring that their systems operate correctly and efficiently. Configuration and performance charts need to be available directly from the Leopard Topology Viewer, enabling system administrators to view detailed information about system components in the same interface they use to view relationships between system components. There is an important synergy here: the Leopard Topology Viewer is an excellent tool for navigating through a system and finding detailed information about components; charts will provide additional information about system components, and the Topology Viewer will be an ideal mechanism to find the components to chart. Charts will need to be customizable so that users can visualization any information they need. This workspace will be a highly customizable way for users to monitoring system status. Together, these integrated reporting features improve system administrators abilities to monitor, troubleshoot, and plan their system.

User Studies

Improvements to the user interface require user testing to determine their utility and effectiveness.

Selected Publications

  • Sandip Agarwala, Luis Angel D. Bathen, Divyesh Jadav, Ramani Routray, "Configuration Discovery and Monitoring Middleware for Enterprise Datacenter", in Proceedings of the IEEE Network Operations and Management Symposium (NOMS), 2010.
  • Sandip Agarwala, Ramani Routray, "Cluster Aware Storage Resource Provisioning in a Data Center", in Proceedings of the IEEE Network Operations and Management Symposium (NOMS), 2010.
  • Tapan Nayak, Ramani Routray, Aameek Singh, Sandeep Uttamchandani, Akshat Verma, "End-to-end Disaster Recovery Planning: From Art to Science", in Proceedings of the IEEE/IFIP Network Operations & Management Symposium (NOMS), 2010.
  • Ramani Routray, Shripad Nadgowda, "VLS: Simulated System Management Test Infrastructure as a Service", in Proceedings of IEEE/IFIP Network Operations & Management Symposium (NOMS), 2010.
  • Eser Kandogan, Paul P. Maglio, Eben M. Haber, John H. Bailey, "Scripting practices in complex systems management", in Proceedings of CHIMIT, 2009.
  • Sandip Agarwala, Luis Angel D. Bathen, Divyesh Jadav, Ramani Routray, "ParaDisE: parallel discovery engine for enterprise datacenters", in Proceedings of the 6th International Conference on Autonomic Computing (ICAC), 2009.
  • Aameek Singh, Mudhakar Srivatsa, Ling Liu, "Search-as-a-Service: Outsourced Search over Outsourced Storage", in ACM Transactions on the Web Journal, Vol-3 (4), 2009.
  • Nedyalko Borisov, Shivnath Babu, Sandeep Uttamchandani, Ramani Routray, Aameek Singh, "DiaDS: A Problem Diagnosis Tool for Databases and Storage Area Networks", Demonstration at International Conference on Very Large DataBases (VLDB), 2009.
  • Madhukar Korupolu, Aameek Singh, Bhuvan Bamba, "Coupled Placement in Modern Data Centers", in Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2009.
  • Michael Cardosa, Madhukar Korupolu, Aameek Singh, "Shares and Utilities based Power Consolidation in Virtualized Server Environments", in Proceedings of IFIP/IEEE International Symposium on Integrated Network Management (IM), 2009.
  • Hai Huang, Yaoping Ruan, Anees Shaikh, Ramani Routray, Chung-hao Tan, Sandeep Gopisetty, "Building End-to-End Management Analytics for Enterprise Data Centers", in Proceedings of IFIP/IEEE Integrated Network Management (IM), 2009.
  • Shivnath Babu, Nedyalko Borisov, Sandeep Uttamchandani, Ramani Routray, Aameek Singh, "DiaDS: Addressing the "my-problem-or-yours" syndrome with Integrated SAN and Database Diagnosis", in Proceedings of USENIX FAST, 2009.
  • Shivnath Babu, Nedyalko Borisov, Sandeep Uttamchandani, Ramani Routray, Aameek Singh, "Why Did My Query Slow Down?", in Proceedings of Conference on Innovations in Database Research (CIDR), 2009.
  • Ramani Routray, Shripad Nadgowda, "CIMDIFF: Advanced Difference Tracking Tool for CIM Compliant Devices", in Proceedings of USENIX Large Installation System Administration Conference (LISA), 2009.
  • David Eyers, Ramani Routray, Rui Zhang, Peter Pietzuch, Douglas Willcocks, "Towards a Middleware for Configuring Large-scale Storage Infrastructures", in Proceedings of International Workshop on Middleware for Grids, Clouds and e-Science (MGC), 2009.
  • Akshat Verma, Kaladhar Voruganti, Ramani Routray, Rohit Jain, "SWEEPER: An Efficient Disaster Recovery Point Identification Mechanism", in Proceedings of USENIX Conference on File and Storage Technologies (FAST), 2008.
  • Sandip Agarwala, Ramani Routray, Sandeep Uttamchandani, "ChargeView: An Integrated Tool for Implementing Chargeback in IT Systems", in Proceedings of IEEE Network Operations and Management Symposium (NOMS), 2008.
  • Sandip Agarwala, Sandeep Gopisetty, "iCharge: An intelligent tool for quantifying and implementing chargeback in virtualized IT systems", in IBM Academy 2nd Annual Conference on Virtualization, 2008.
  • Aameek Singh, Madhukar Korupolu, Dushmanta Mohapatra, "Server Storage Virtualization: Experiences with Integration and Load Balancing", in Proceedings of IEEE/ACM Supercomputing (SC), 2008.
  • Aameek Singh, Ling Liu, "SHAROES: A Data Sharing Platform for Outsourced Enterprise Storage Environments", in Proceedings of International Conference on Data Engineering (ICDE), 2008.
  • Aameek Singh, Sandeep Uttamchandani, Yin Wang, "Evaluating the Effectiveness of Information Extraction in Real-World Storage Management", in Proceedings of IEEE/ACM MASCOTS, 2008.
  • Aameek Singh, Ling Liu, Mustaque Ahamad, "Privacy Analysis and Enhancements for Data Sharing in *nix Systems", International Journal of Information and Computer Security (IJICS), Vol 2(4), 2008.
  • Michael Cardosa, Madhukar Korupolu, Aameek Singh, Sandeep Gopisetty, "Multi-Dimensional and Multi-Objective Constraints in Continuous Optimization of Data Centers", IBM Academy of Technology Annual Conference on Virtualization, 2008.
  • Aameek Singh, Madhukar Korupolu, Sandeep Gopisetty, "Automated Storage Virtualization Planning and Provisioning with IBM SAN Volume Controller", poster at IBM Academy of Technology Annual Conference on Virtualization, 2008.
  • Sandeep Gopisetty et al, "The Evolution of Storage Management: Transforming raw data into information", IBM Journal of Research and Development, Vol 52 (4, 5), 2008.
  • Sandeep Gopisetty et al, "Intelligent Planners for Storage Provisioning and Disaster Recovery", IBM Journal Special Issue on Storage Technologies, Vol 52 (4, 5), 2008.
  • Sandeep Gopisetty, Divyesh Jadav, Ramani Routray, "SANTria: A Tool for Managed System Configuration Problem Triage", in Proceedings of IBM Academy Proactive Problem Prediction, Avoidance and Diagnosis Conference, 2008.
  • Eben M. Haber, John H. Bailey, "Design guidelines for system administration tools developed through ethnographic field studies", in Proceedings of CHIMIT, 2007.
  • Aameek Singh, Madhukar Korupolu, "Integrated Provisioning Planning for Virtual Appliances and Images", poster at Haifa Systems & Storage Conference (SYSTOR), 2007.
  • Aameek Singh, Madhukar Korupolu, Bhuvan Bamba, "SPARK: Integrated Resource Allocation in Heterogeneous SAN Data Centers", IBM Research Report RJ10407, Proceedings of Principles of Distributed Computing (PODC), 2007 (Brief Announcement).
  • Sandeep Uttamchandani, Kaladhar Voruganti, Ramani Routray, Li Yin, Aameek Singh, Benji Yolken, , "BRAHMA: Planning Tool for Providing Storage Management as a Service", in Proceedings of IEEE International Conference on Services Computing (SCC), 2007.
  • Sandeep Gopisetty, Divyesh Jadav, Ramani Routray, "Agar: A Proactive and Reactive Data Center Configuration Change Analyzer", in Proceedings of IBM Academy Proactive Problem Prediction, Avoidance and Diagnosis Conference, 2007.
  • Prasenjit Sarkar, Ramani Routray, Eric Butler, Kaladhar Voruganti, Chung-hao Tan, Kiyoung Yang, "SPIKE : Best practice generation for storage area networks", in Proceedings of USENIX Second Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), 2007.
  • Ramani Routray, Sandeep Gopisetty, Pallavi Galgali, Amit Modi, Shripad Nadgowda , "iSAN: Storage Area Network Management Modeling Simulation", in Proceedings of IEEE International Conference on Networking, Architecture, and Storage (NAS), 2007.
  • Andreas Dieberger, Eser Kandogan, Cheryl A. Kieliszewski, "Scalability in system management GUIs: a designer's nightmare", CHI Extended Abstracts, 2006.
  • Eric Butler, Roberto Pineiro, Ramani Routray, Prasenjit Sarkar, Chung-Hao Tan, Kaladhar Voruganti, "Automatically generating best practices for configuring storage area networks based on the analysis of configuration errors", in Proceedings of IBM Academy Proactive Problem Prediction, Avoidance and Diagnosis Conference, 2006.
  • Dakshi Agrawal, Eric Butler, Sandeep Gopisetty, Sudhir Koka, Kangwon Lee, Ramani Routray, Gauri Shah, Kaladhar Voruganti, "Pro-active and Re-active Best Practices Based SAN Configuration Management", in Proceedings of IBM Academy Best Practices Conference on Patterns for operational efficiency in distributed environments, 2006.
  • Eser Kandogan, Eben M. Haber, Rob Barrett, Allen Cypher, Paul P. Maglio, Haixia Zhao, "A1: end-user programming for web-based system administration", UIST 2005:211-220.
  • Aameek Singh, M. Korupolu, K. Voruganti, "Zodiac: Efficient Impact Analysis for Storage Area Networks", in Proceedings of USENIX File and Storage Technologies (FAST), 2005.
  • R. Jain, T. Mohan, R. Pineiro, R. Routray, G. Shah, Aameek Singh, A. Verma, K. Voruganti, "Atlantis: A Best Practices Based Storage Disaster Recovery Planner", in Proceedings of IBM Academy High Availability Best-Practices Conference, 2005.
  • Dakshi Agrawal, Stefan Jaquet, Madhukar Korupolu, Kang-won Lee, Kostas Magoutis, Ramani Routray, Gauri Shah, Gopalan Sivathanu, Brian Smith, Chung-Hao Tan, Sandeep Uttamchandani, Norbert Vogl, Kaladhar Voruganti, Li Yin, Omer Zaki, "SMaestro: Performance Aware Storage Infra-Structure Planner", in Proceedings of IBM Acaademy Performance Engineering Best Practices Topical Conference, 2005.
  • Aameek Singh, K. Voruganti, S. Gopisetty, D. Pease, L. Duyanovich, L. Liu, "Security vs. Performance: Tradeoffs using a Trust Framework", in Proceedings of IEEE/NASA Mass Storage Systems and Technologies (MSST), 2005.
  • Aameek Singh, K. Voruganti, S. Gopisetty, D. Pease, L. Liu, "A Hybrid Access Model for Storage Area Networks", in Proceedings of IEEE/NASA Mass Storage (MSST), 2005.
  • Aameek Singh, K. Voruganti, S. Gopisetty, A. Fleshler, R. Routray, C. Tan, "SANFS Maestro: Resource Planning for Enterprise Storage Area Network (SAN) File Systems", in Proceedings of International Conference on e-Business, Enterprise Information Systems, e-Government and outsourcing (EEE), 2005.

Selected Patents

  • US Patent No. 7386585: Systems and methods for storage area network design, Agrawal, Dakshi; Gopisetty, Sandeep K.; Lee, Kang-Won; Routray, Ramani R.; Verma, Dinesh; Voruganti, Kaladhar
  • US Patent No. 7548963: System and method for generating a multi-plan for a multi-layer storage area network, Devarakonda, Murthy V; George, David Alson; Gopisetty, Sandeep Kumar; Lee, Kang-Won; Magoutis, Konstantinos; Routray, Ramani Ranjan; Shah, Gauri; Tan, Chung-Hao; Vogl, Norbert George; Voruganti, Kaladhar
  • US Patent No. 7526540: System and method for assigning data collection agents to storage area network nodes in a storage area network resource management system, Gopisetty, Sandeep Kumar; Merbach, David Lynn; Sarkar, Prasenjit
  • US Patent No. 7397770: Checking and repairing a network configuration, Le, Cuong Minh; Shackelford, David Michael; Ratliff, James Mitchell; Voruganti, Kaladhar; Gopisetty, Sandeep; Basham, Robert Beverley; Verma, Dinesh C.; Lee, Kang-Won; Agrawal, Dakshi; Yardley, Brent William; Filali-Adib, Khalid
  • US Patent No. 7519624: Method for proactive impact analysis of policy-based storage systems, Korupolu, Madhukar R.; Singh, Aameek; Voruganti, Kaladhar
  • US Patent No. 7246254: System and method for automatically and dynamically optimizing application data resources to meet business objectives, Alur, Nagraj Ramachandran; Gogate, Vitthal M.; Narang, Inderpal Singh; Routray, Ramani Ranjan; Subramanian, Mahadevan
  • US Patent No. 7493300: Model and system for reasoning with N-step lookahead in policy-based system management, Palmer, John Davis; Uttamchandani, Sandeep Madhav; Voruganti, Kaladhar

People

  • Sandeep Gopisetty
  • Sandip Agarwala
  • Gabriel Alatorre
  • Luis D Bathen
  • Eric Butler
  • Eben Haber
  • Divyesh Jadav
  • Tara Matthews
  • Ramani R Routray
  • Aameek Singh
  • Chung-hao Tan

Product Impact

The vision of the TotalStorage Productivity Center (TPC) is to provide an integrated set of software offerings that provide end-to-end storage infrastructure management, from the host/application to the target storage device in a heterogeneous platform environment. This vision is the realization of a research initiative to place IBM in a leadership position in Storage Management which focuses on creating a comprehensive storage management software offerings that provide fabric, disk and tape subsystem configuration, performance and replication management, SAN fabric management, and host centered usage of storage from the perspective of the database application or file system. This provides the integrated storage network analytics and performance management tools to aid policy based autonomic resource provisioning of storage networks. An additional goal is to provide integration points for business partners, STG server management, and for automation/provisioning functions that are delivered through Tivoli Provisioning Manager and/or Intelligent Orchestrator.

Most impressive was the new "topology viewer" user interface, which provides a central location and graphical view of the storage environment, which enables users to monitor and troubleshoot problems quickly, and to access additional TPC tasks and functions without the users losing their orientations to the overall environment. Some of the features include:

The innovation is not stopping with that single version. In the first quarter of 2007, TPC 3.3, Research added significant new functions without creating an unwanted air of complexity.

The new features include:

Research helped enable intelligent performance optimization in TPC 4.1 enabling improved disk optimization of enterprise environments increasing overall availability of storage network and applications. Finally, in TPC 4.2, Research is adding even more adrenaline-pumping capabilities with a new innovative configuration planning capabilities to the TPC SAN Planner Wizard that extends to the IBM System Storage SAN Volume Controller and adds resiliency management.