Workshop on Managing Systems Automatically and Dynamically (MAD)

Final Submission Deadline
July 6, 2012

Submitted by Greg Bronevetsky
Workshop on Managing Systems Automatically and Dynamically (MAD)
At the USENIX Symposium on Operating Systems Design and Implementation (OSDI)
October 8-10, 2012
Hollywood, CA, USA

* Full paper submission due: Friday, July 6, 2012
* Notification of acceptance: Friday, August 10, 2012
* Final papers due: Wednesday, September 12, 2012

The complexity of modern systems makes them extremely challenging to manage.
From highly heterogeneous desktop environments to large-scale systems that
consist of many thousands of software and hardware components, these systems
exhibit a wide range of complex behaviors are difficult to predict. As such,
although raw computational capability of these systems grows each year, much of
it is lost to (i) complex failures that are difficult to localize and (ii) to
poor performance and efficiency that results from system configuration that is
inappropriate for the user’s workload. The MAD workshop (an extended
follow-on of the SLAML workshop) focuses on techniques to make complex
systems manageable, addressing the problem’s three major aspects:

System Monitoring
Systems report their state and behavior using a wide range of mechanisms.
System and application logs include reports of key events that occur within
software or hardware components. Performance counters measure various OS and
hardware-level metrics (e.g. packets sent or cache misses) within a given time
period. Further, information from source code version control systems or
request traces can help identify the source of failures of poor performance.

Data Analysis
Data produced by monitoring can be analyzed using a variety of techniques to
understand the system state and predict its behavior in various possible
scenarios. Traditionally this consisted of system administrators manually
inspecting system logs or using explicit pattern-matching rules to identify key
events. Recent research has also focused on statistical and machine learning
techniques to automatically identify behavioral patterns. Finally, the data can
be presented directly to system administrators. Because of its large volume,
such displays involve aggregation techniques that show the maximal information
in minimal space.

Informed Action
The analyses and visualizations are used by operators to select the best action
to improve productivity or localize and resolve system failures. The possible
actions include restarting processes, rebooting servers, rolling back
application updates or reconfiguring system components. Since the choice of the
best action is complex, it requires assistance from additional analysis tools
to predict the productivity of any given configuration on the given workload.
MAD seeks original early work on system management, including position papers
and work-in-progress reports that will mature to be published at high-quality
conferences. Papers are expected to demonstrate a strong foundation in the
needs of the system management community and be positioned within the broader
context of related work. In addition to technical merit, papers will be
selected to encourage discussion at the workshop and among members of the
general system management community.

Topics include but are not limited to:
* Techniques to collect metric and log data, including tracing and statistical
* Large-scale aggregation of metric and log data
* Reports on publicly available sources of sample logs of system metrics

* Automated analysis of system logs and metrics using statistical, machine
learning, natural language processing techniques
* Visualization of system information in a way that leads administrators to
actionable insights
* Evaluation of the quality of learned models, including assessing the
confidence/reliability of models and comparisons between different methods

* Applications of log and metric analysis to address reliability, performance,
power management, security, fault diagnosis, scheduling, or manageability
* Challenges of scale in applying machine learning to large systems
* Integration of machine learning into real-world systems and processes

Peter Bodik, Microsoft Research (peterb@microsoft.com)
Greg Bronevetsky, Lawrence Livermore National Laboratory (bronevetsky@llnl.gov)

Submitted papers must be no longer than 6 8.5″x11″ or A4 pages, using a 10
point font on 12 point (single spaced) leading, with a maximum text block of
6.5 inches wide by 9 inches deep. The page limit includes everything except
for references, for which there is no limit. The use of color is acceptable,
but the paper should be easily readable if viewed or printed in gray scale.
Authors must make a good faith effort to anonymize their submissions, and they
should not identify themselves either explicitly or by implication (e.g.,
through the references or acknowledgments). Submissions violating the detailed
formatting and anonymization rules on the Web site will not be considered for
publication. Authors who are not sure about anonymization or whether their
paper fits into MAD should contact the MAD chairs. There will be no extensions
for reformatting. Papers will be held in full confidence during the reviewing
process, but papers accompanied by nondisclosure agreement forms are not
acceptable and will be rejected without review. Authors of accepted papers will
be expected to supply electronic versions of their papers and encouraged to
supply source code and raw data to help others replicate and better understand
their results.