Updates
Deadlines
(CFP): 2nd Int. Workshop - PredictOr Models In Software
Engineering (PROMISE)
Sender: seworld-owner@cs.colorado.edu
Precedence: bulk
Second International Workshop on Predictor Models In Software
Engineering
(PROMISE 2006)
http://unbox.org/promise/2006
Sunday September 24, 2006
Philadelphia, Pennsylvania USA
In conjunction with 22nd IEEE Int. Conf. on Software Maintenance
http://icsm2006.cs.drexel.edu/
Objectives
There is no doubt software engineering (SE) is alive and well.
The U.S. IT industry showed an 8 percent growth rate in
2005. Employment levels are now 17 percent higher than in 1999,
5 percent higher than during the dot.com bubble of 2000. New
software engineering courses are springing up all over the US
as industry demands graduates with SE knowledge. Similar trends are
reported around the world [Communications of the ACM, Vol. 48,
No. 9, Sept 2005, page 26].
As we enter this second renaissance of software engineering, we vow
to never again indulge in the hype and unfounded speculation of the
dot.com boom. Software management decisions must be based on
well-understood and well-supported predictive models. Ideally,
these predictive models should provide reliable and accurate
guidelines regarding these numerous decisions. Ultimately, these
models will help the software manager better understand software
engineering.
A good model should be a generalization of real-world data. But
where does the data come from? Collecting data from real world
software engineering projects is problematic. Software projects
are notoriously difficult to control and corporations are often
reluctant to expose their own software development record to public
scrutiny.
Since data is difficult to attain, we need to make better use of
whatever data is available.
For example, we need to know:
- How much do we need to know about software engineering in order to build effective models?
- How to adapt models to new data?
- How to streamline the data or the process?
- How can predictive modeling gain greater acceptance by the SE community?
These are the questions addressed by this workshop.
As a follow-up to last year's workshop, this workshop focuses upon
"issues and challenges surrounding building predictive software
models." Predictor models already exist for software development
effort and fault injections as well as co-update or change
predictors, software quality estimators and software escalation
("escalation" predictors try to guess what bug reports will require
the attention of the senior experts). However, in most cases they
have been presented in venues that cover a diverse set of interests.
Goals of the Workshop
The goals of this one-day workshop are:
- To bring together researchers and practitioners from various backgrounds with interest in building predictive models with the aim of sharing experience and expertise.
- To steer discussion and debate on various aspects and issues related to building predictive software models.
- To initiate the generation of a publicly available repository of software engineering data sets. We believe such a repository is essential to the maturity of the field of predictive software models and software engineering in general.
- To put together a list of open research questions that are deemed essential by the researchers in the field
Public Data Policy
PROMISE 2006 GIVES THE HIGHEST PRIORITY TO CASE STUDIES, EXPERIENCE
REPORTS, AND PRESENTED RESULTS THAT ARE BASED ON PUBLICLY AVAILABLE
DATASETS. TO INCREASE THE CHANCE OF ACCEPTANCE, AUTHORS ARE URGED
TO SUBMIT PAPERS THAT USE SUCH DATASETS. DATA CAN COME FROM ANYWHERE
INCLUDING THE WORKSHOP WEB SITE. SUCH PAPER SUBMISSIONS SHOULD
INCLUDE THE URL ADDRESS OF THE DATASET(S) USED.
A COPY OF THE PUBLIC DATASETS USED IN THE ACCEPTED PAPERS WILL BE
POSTED ON "THE PROMISE SOFTWARE ENGINEERING REPOSITORY." THEREFORE,
IF APPLICABLE, THE AUTHORS SHOULD OBTAIN THE NECESSARY PERMISSION
TO DONATE THE DATA BEFORE SUBMITTING THEIR PAPER. ALL DONORS WILL BE
ACKNOWLEDGED ON THE PROMISE REPOSITORY WEB SITE.
The use of publicly available datasets will facilitate generation
of repeatable, verifiable, refutable, and improvable results,
as well as providing an opportunity for researchers to test and
develop their hypothesis, algorithms, and ideas on a diverse set
of software systems. Examples of such datasets can be found at
http://promise.site.uottawa.ca
or the "PROMISE SOFTWARE ENGINEERING REPOSITORY" at
http://promise.site.uottawa.ca/SERepository.
We ask all researchers in the field to assist us with expanding the
PROMISE repository by donating their data sets. For inquiries
regarding data donation please send an email to promise@unbox.org
Topics of Interest
In line with the above mentioned goals, the main topics of interest
include:
- Applications of predictive models to software engineering data. . What predictive models can be learned from software engineering data?
- Strengths and limitations of predictive models.
- Empirical Model Evaluation Techniques. . What are best baseline models for different classes of predictive software models? . Are existing measures and techniques to evaluate and compare model goodness such as precision, recall, error rate, or ROC analysis adequate for evaluating software models? Or are more specific measures geared toward software engineering domain needed? . Are certain measures better suited for certain classes of models? . What are the appropriate techniques to test the generated models e.g. hold-out, cross-validation, or chronological splitting?
- Field evaluation challenges and techniques. . What are the best practices in evaluating the generated software models in the real world? . What are the obstacles in the way of field testing a model in the real world? . How to overcome obstacles in the acceptance of predictive models in the real world?
- Model shifting.(Concept drift). . When does a model need to be replaced? . What are the best approaches to keeping the model in sync with the changes in the software? . What predictive models are more prone to model shift?
- Building models using machine learning, statistical methods, and other methods. . How do these techniques lend themselves to building predictive software models? . Are some methods better suited for certain classes of models? . How do these algorithms scale up when handling very large amounts of data? . What are the challenges posed by the nature of data stored in software repositories that make certain techniques less effective than the others?
- Cost benefit analysis of predictive software models . Is cost-benefit analysis a necessary step in evaluating all predictive models? . What are the requirements for one to be able to perform a cost benefit analysis? . What particular costs and benefits should be considered for these models?
- Case studies on building predictive software models.
Benchmark Dataset Papers
To encourage data sharing and/or publicize new and challenging
research direction, a special category of papers will be considered
for inclusion in the workshop. Papers submitted under this category
should at least include the following information:
- The public URL to a new dataset
- Background notes on the domain
- What problem does the data represent?
- What would be gained if the problem was solved?
- Proposes a measure of goodness to be used to judge the results; for instance a good defect detector has a high probability of detection and a low probability of false alarm.
- A review of current work in the field (e.g. what is wrong with current solutions or why has no one solved this problem before?)
- Description of data format. . Recommended format is Attribute-Relation File Format (ARFF) http://www.cs.waikato.ac.nz/~ml/weka/arff.html For an example of such a dataset see "Cocomo NASA/Software cost estimation" on the "PROMISE Software Engineering Repository" http://promise.site.uottawa.ca/SERepository However, if ARFF is not an appropriate format for your data, please provide a detailed description of your data format in the paper. . A guideline from UCI Machine Learning repository for documenting datasets can be found in
e-learning-databases/DOC-REQUIREMENTS" target="_blank">ftp://ftp.ics.uci.edu/pub/machine-learning-databases/DOC-REQUIREMENTS
This information is placed before the actual data when using ARFF
format. However, if you are using an alternative format that does
not support comments in the dataset, provide this information in
a separate file with extension .desc, and submit the URL of this
file.
- Preferably some baseline results
Submission Process
Submissions are five pages long (max). Papers must be original and
previously unpublished. SUBMISSIONS WHICH INCLUDE EMPIRICAL RESULTS
BASED ON PUBLICLY ACCESSIBLE DATASETS WILL BE GIVEN THE HIGHEST
PRIORITY.
Accepted papers and other materials for the Proceedings must be
revised to conform to IEEE style guidelines defined at:
http://www.computer.org/portal/site/ieeecs/
menuitem.c5efb9b8ade9096b8a9ca
0108bcd45f3/index.jsp?&pName3Dieeecs_level1&path3Dieeecs/publications/c
ps&file3Dcps_format1.xml&xsl3Dgeneric.xsl&
Templates for submissions are found at:
- Latex: http://www.unbox.org/promise/2006/style/latex/
- MS Word: http://www.unbox.org/promise/2006/style/word/
Accepted file formats are Postscript and PDF. The details of paper
and data submission process are available at:
http://www.unbox.org/promise/2006/site.php?what3Dcfp&title3DCall%20for%
20papers
To submit papers:
- Email them to: promise@unbox.org
- Make the title of that email "[SUBMISSION]: your paper title"
Each paper will be reviewed by the program committee in terms of
their technical content and their relevance to the scope of the
workshop, as well as its ability to stimulate discussion. At least
one author of accepted papers is required to register and attend the
workshop.
Prior to the workshop the accepted papers will be posted on the
workshop web page at:
http://unbox.org/promise/2006
This is to facilitate a more fruitful discussion during the workshop.
Journal of Empirical Software Engineering: Special Issue
Papers accepted to PROMISE 2006 will be eligible for submission to a
special issue of the Journal of Empirical Software Engineering.
The theme of the issue will be "Repeatable Experiments in
Software Engineering."
The issue will be edited by Tim Menzies.
Important Dates
Submission of workshop papers June 19, 2006
Notification of workshop papers July 14, 2006
Publication ready copy August 14, 2006
Workshop Co-Chairs
Gary D. Boetticher University of Houston-Clear Lake, US
Tim Menzies West Virginia University
Program Committee
Vic Basili University of Maryland, US
Dan M. Berry University of Waterloo, Canada
Barry Boehm University of Southern California, US
Gary D. Boetticher University of Houston-Clear Lake, US
Lionel Briand Carleton University, Canada
Bojan Cukic West Virginia University, US
Bill Curtis Borland Corporation, US
Alexander Dekhtyar University of Kentucky, US
Martin S. Feather NASA JPL, US
Norman Fenton Queen Mary(University of London), UK
A. Gunes Koru University of Maryland, Baltimore County US
Jane Hayes University of Kentucky, US
Jairus Hihn NASA JPL, US
Ross Jeffrey University of New South Wales, AU
Taghi M. Khoshgoftaar Florida Atlantic University, US
Tim Menzies West Virginia University, US
Martin Neil Queen Mary(University of London), UK
Allen P. Nikora NASA JPL, US
Daniel N. Port University of Hawaii, US
Julian Richardson NASA ARC, US
Guenther Ruhe University of Calgary, Canada
Jelber Sayyad-Shirabad University of Ottawa, Canada
Martin Shepperd Brunel University, UK
Forrest Shull Fraunhofer Center -- Maryland, US
Willem Visser NASA ARC, US
Laurie Williams North Carolina State University, US
Marvin Zelkowitz University of Maryland, US