The present disclosure is generally directed to data processing and more particularly to testcases. Still more particularly, the present disclosure is directed to techniques for automatically generating new testcases for testing a software system from existing testcases based on noun-verb pairings.
Watson was originally designed as a question answering (QA) system (i.e., a data processing system) that applied advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning technologies to the field of open domain question answering. In general, document search technology receives a keyword query and returns a list of documents, ranked in order of relevance to the query (often based on popularity and page ranking). In contrast, QA technology receives a question expressed in natural language, seeks to understand the question in greater detail than document search technology, and returns a precise answer to the question.
The original Watson system reportedly employed more than one-hundred different algorithms to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses. The original Watson system implemented DeepQA™ software and the Apache™ unstructured information management architecture (UIMA) framework. Software for the original Watson system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE™ Linux Enterprise Server 11 operating system using the Apache Hadoop™ framework to provide distributed computing. As is known, Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware.
The original Watson system employed DeepQA software to generate hypotheses, gather evidence (data), and analyze the gathered data. The original Watson system was workload optimized and integrated massively parallel POWER7® processors. The original Watson system included a cluster of ninety IBM Power 750 servers, each of which includes a 3.5 GHz POWER7 eight core processor, with four threads per core. In total, the original Watson system had 2,880 POWER7 processor cores and 16 terabytes of random access memory (RAM). Reportedly, the original Watson system could process 500 gigabytes, the equivalent of a million books, per second. Sources of information for the original Watson system included encyclopedias, dictionaries, thesauri, newswire articles, and literary works. The original Watson system also used databases, taxonomies, and ontologies.
In software engineering, a testcase is a set of conditions or variables under which a tester determines whether an application or software system (or a feature of the application or the software system) functions as designed. Testcases are often referred to as test scripts or test automation code and are usually collected into test suites. A test oracle (e.g., a requirement, a use case, or a heuristic) provides a mechanism for determining whether an application or software system has passed or failed a test. Many different testcases may be employed to determine whether an application or software system is sufficiently tested prior to release.
In order to fully test that all application requirements are met, usually at least two testcases (i.e., a positive test and a negative test) are needed for each requirement. If a requirement has sub-requirements, each sub-requirement must also usually have at least two testcases. Tracking a link between a requirement and a test is frequently performed using a traceability matrix. Written testcases usually include a description of the functionality to be tested and the preparation required to ensure that the test can be conducted. A formal written testcase is characterized by a known input and by a predetermined expected output. The known input usually tests a precondition and the expected output usually tests a post-condition.
For applications or software systems without formal requirements, testcases can be written based on accepted normal operation of programs of a similar class. In certain instances, testcases are not written but activities and results are reported after tests have been executed. In scenario testing, scenarios or hypothetical stories are used to aid a tester in thinking through a complex problem. Scenarios may be as simple as a diagram for a testing environment or a description written in prose. Scenarios are usually different from testcases in that testcases are usually single steps while scenarios may cover a number of steps.
Machine learning is a scientific discipline that deals with the construction and study of algorithms that learn from data. Machine learning algorithms operate by building a model based on inputs and using the model to make predictions or decisions, as contrasted with only following explicit program instructions. Machine learning has been employed in various computing tasks, e.g., where designing and programming explicit, rule-based algorithms are not practical. Machine learning tasks may be supervised or unsupervised. In supervised learning a data processing system may be presented with example inputs and associated desired outputs with the goal of learning a general rule that maps inputs to outputs. Spam filtering is one example of supervised learning (in particular, classification), where a learning algorithm is presented with email (or other) messages labeled as ‘spam’ or ‘not spam’ to produce a program that independently determines whether subsequent messages are spam. In unsupervised learning, a learning algorithm learns on its own without a so-called ‘trainer’. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end. Topic modeling is one example of unsupervised learning, where a program is given a list of human language documents and is tasked to determine which documents cover similar topics.