Updates
Deadlines
ICSE 2007 Workshop: Predictor Models in Software Engineering
(PROMISE)
Sender: seworld-owner@cs.colorado.edu
Precedence: bulk
----
Third International Workshop on Predictor Models in Software Engineering
(PROMISE 2007)
http://promisedata.org/2007
Sunday May 20, 2007
Minneapolis, Minnesota USA
In conjunction with 29th Int. Conf. on Software Engineering
http://web4.cs.ucl.ac.uk/icse07/
Objectives
As in any engineering field, realistic prior assessment
of the potential cost, problems, timing, performance,
safety, security, and numerous other properties of software
projects is essential for effective and efficient planning,
design, and implementation of those projects.
A mature engineering discipline needs to have a standard
set of predictive methods that practitioners can use, as
well as standards for interpreting the results of those
methods. To become widely accepted and used in the field,
models need to be validated on data from a wide range
of applications, in different development environments,
and with different reliability and performance goals.
The PROMISE workshop aims to broaden knowledge of
predictive models that have been successfully developed,
to provide a forum for the discussion of new models,
to provide a catalog of system data that researchers can
use to evaluate proposed models so that practitioners can
use these models to compare predicted results to their
own projects.
As a follow-up to last year's workshop, this workshop
focuses upon "issues and challenges surrounding building
predictive software models." Predictor models already exist
for software development effort and fault injections as
well as co-update or change predictors, software quality
estimators and software escalation ("escalation" predictors
try to guess what bug reports will require the attention of
the senior experts). However, in most cases they have been
presented in venues that cover a diverse set of interests.
Goals of the Workshop
The goals of this one-day workshop are:
- To expand the current public repository of data sets related to software engineering in order to conduct repeatable, refutable or improvable experiments. Such an empirical process is essential to the maturity of the field of predictive software models and software engineering in general. After only two years, the current PROMISE repository already contains 24 data sets.
- To deliver to the software engineering community useful and usable and verified models or methods:
- "Models" predict software properties of interest to 21st century software practitioners. Numerous such models are already under development, including models that predict software quality, development effort, requirements/design/code traceability etc.
- "Methods" are learning systems for building particular models for particular situations.
- To compile a list of open research questions that are deemed essential by the researchers in the field.
- To show, by example, to the next generation of software engineering researchers that empiricism is useful, practical, exciting, and insightful.
- To bring together researchers and practitioners with the aim of sharing experience and expertise.
- To steer discussion and debate on various aspects and issues related to building predictive software models.
Public Data Policy
PROMISE 2007 gives the highest priority to case studies,
experience reports, and presented results that are based
on publically available datasets. To increase the chance
of acceptance, authors are urged to submit papers that
use such datasets. Data can come from anywhere including
the workshop Web site. Such papers should include the URL
address of the dataset(s) used.
A copy of the public datasets used in the accepted papers
will be posted on "The PROMISE Software Engineering
Repository." Therefore, if applicable, the authors should
obtain the necessary permission to donate the data prior
to submitting their paper. All donors will be acknowledged
on the PROMISE repository Web site.
The use of publicly available datasets will facilitate
generation of repeatable, verifiable, refutable, and
improvable results, as well as providing an opportunity
for researchers to test and develop their hypothesis,
algorithms, and ideas on a diverse set of software
systems. Examples of such datasets can be found at
http://promisedata.org/repository
We ask all researchers in the field to assist us with
expanding the PROMISE repository by donating their data
sets. For inquiries regarding data donation please send
an email to mail@promisedata.org
Topics of Interest
In line with the above mentioned goals, the main topics
of interest include:
- Applications of predictive models to software engineering data.
- What predictive models can be learned from software engineering data?
- Strengths and limitations of predictive models.
- Empirical Model Evaluation Techniques.
- What are best baseline models for different classes of predictive software models?
- Are existing measures and techniques to evaluate and compare model goodness such as precision, recall, error rate, or ROC analysis adequate for evaluating software models? Or are more specific measures geared toward software engineering domain needed?
- Are certain measures better suited for certain classes of models?
- What are the appropriate techniques to test the generated models e.g. hold-out, cross-validation, or chronological splitting?
- Field evaluation challenges and techniques.
- What are the best practices in evaluating the generated software models in the real world?
- What are the obstacles in the way of field testing a model in the real world?
- How to overcome obstacles in the acceptance of predictive models in the real world?
- How to test the generated models?
- What are the obstacles in the way of field testing a model in the real world?
- What predictive models are more prone to model shift? (Concept drift).
- When does a model need to be replaced?
- What are the best approaches to keeping the model in sync with software changes?
- Building models using machine learning, statistical methods, and other methods.
- How do these techniques lend themselves to building predictive software models?
- Are some methods better suited for certain classes of models?
- How do these algorithms scale up when handling very large amounts of data?
- What are the challenges posed by the nature of data stored in software repositories that make certain techniques less effective than the others?
- Cost benefit analysis of predictive models
- Is cost-benefit analysis a necessary step in evaluating all predictive models?
- What are the requirements for one to be able to perform a cost benefit analysis?
- What particular costs and benefits should be considered for these models?
- Case studies on building predictive software models.
Benchmark Dataset Papers
To encourage data sharing and/or publicize new and
challenging research direction, a special category of
papers will be considered for inclusion in the workshop.
Papers submitted under this category should at least
include the following information:
- The public URL to a new dataset
- Background notes on the domain
- What problem does the data represent?
- What would be gained if the problem was solved?
- Proposes a measure of goodness to be used to judge the results; for instance a good defect detector has a high probability of detection and a low probability of false alarm.
- A review of current work in the field (e.g. what is wrong with current solutions or why has no one solved this problem before?)
- Description of data format. Recommended format is Attribute-Relation File Format (ARFF) http://www.cs.waikato.ac.nz/~ml/weka/arff.html For an example of such a dataset see "Cocomo NASA/Software cost estimation" on the "PROMISE Software Engineering Repository" http://promisedata.org/repository However, if ARFF is not an appropriate format for your data, please provide a detailed description of your data format in the paper. A guideline from UCI Machine Learning repository for documenting datasets can be found in
e-learning-databases/DOC-REQUIREMENTS" target="_blank">ftp://ftp.ics.uci.edu/pub/machine-learning-databases/DOC-REQUIREMENTS
This information is placed before the actual data
when using ARFF format. However, if you are using an
alternative format that does not support comments in
the dataset, provide this information in a separate file
with extension .desc, and submit the URL of this file.
- Preferably some baseline results
Submission Process
Submissions are five to ten pages long (max). Papers must
be original and previously unpublished. SUBMISSIONS WHICH
INCLUDE EMPIRICAL RESULTS BASED ON PUBLICLY ACCESSIBLE
DATASETS WILL BE GIVEN THE HIGHEST PRIORITY.
Accepted papers and other materials for the Proceedings
must be revised to conform to IEEE style guidelines
defined at:
http://www.computer.org/portal/site/ieeecs/menuitem.c5efb9b8ade9096b8a9c
a0108bcd45f3/index.jsp?&pName=ieeecs_level1&path=ieeecs/publications/cps
&file=cps_format1.xml&xsl=generic.xsl&
Templates for submissions are found at:
- Latex: http://promisedata.org/2007/style/latex/
- Word: http://promisedata.org/2007/style/word/ Accepted file formats are Postscript and PDF. The details of paper and data submission process are available at: http://promisedata.org/2007/CFP.html
To submit papers:
- Email them to: 2007@promisedata.org
- Make the title of that email "[SUBMISSION]: your paper title" Each paper will be reviewed by the program committee in terms of their technical content and their relevance to the scope of the workshop, as well as its ability to stimulate discussion. At least one author of accepted papers is required to register and attend the workshop. Prior to the workshop the accepted papers will be posted on the workshop web page at: http://promisedata.org/2007 This is to facilitate a more fruitful discussion during the workshop.
Journal of Empirical Software Engineering: Special Issue
Papers accepted to PROMISE 2007 (and 2006) will be
eligible for submission to a special issue of the Journal
of Empirical Software Engineering on repeatable experiments
in software engineering.
The issue will be edited by Tim Menzies.
Important Dates
Submission of workshop papers January 20, 2007
Notification of workshop papers February 10, 2007
Publication ready copy March 5, 2007
General Chair
Gary Boetticher Univ. of Houston - Clear Lake
Steering Committee
Gary Boetticher Univ. of Houston - Clear Lake
Tim Menzies West Virginia University, US
Tom Ostrand AT&T
Program Committee
Vic Basili University of Maryland, US
Dan Berry University of Waterloo, Canada, US
Barry Boehm University of Southern California
Gary Boetticher Univ. of Houston - Clear Lake, US
Lionel Briand Carleton University, Canada
Bojan Cukic West Virginia University, USA
Alex Dekhtyar University of Kentucky, US
Martin Feather NASA JPL, US
Norman Fenton Queen Mary (U. of London), UK
Jane Hayes University of Kentucky, USA
Jairus Hihn NASA JPL's Deep Space Network, US
Gunes Koru U. of Maryland, Balt. Cty US
Tim Menzies West Virginia University, US
Martin Neil Queen Mary(U. of London), UK
Allen Nikora NASA JPL, US
Tom Ostrand AT&T, US
Daniel Port University of Hawaii, USA
Julian Richardson NASA ARC, US
Guenther Ruhe University of Calgary, Canada
Martin Shepperd Brunel University, UK
Forrest Shull Fraunhofer Centre Maryland, USA
Willem Visser NASA ARC, US
Elaine Weyuker AT&T, US
Laurie Williams North Carolina State Univ., USA
Marv Zelkowitz University of Maryland, US
Du Zhang Cal. State Univ., Sacramento, USA