Data science applied to BPM
At Bonitasoft, the R&D team is currently working on some cool stuff using data science, artificial intelligence, and business process management together.
Data Science is a relatively new practice leveraging mathematics, statistics and data visualization. This practice has emerged with the increasing data volume generated by systems over the last decade. This is what we call "BIG DATA".
By exploring huge amount of data, one is able to discover and understand complex trends and behaviours. By better understanding data one can take smarter decisions.
For example, Netflix analyzes all the data available to discover their users’ habits and interests. Thanks to high tech analytic techniques, they are able to guess what else their users might be interested in watching. Netflix's suggestions are possible because of powerful algorithms that leverage a big amount of data from a user’s search history and watch list.
This first example of BIG DATA usage can be augmented with examples from Amazon, Google, Dominos Pizza (yes, pizza makers are also leveraging Big Data) and NASA who all follow a similar approach.
We can see that application domains are very wide: online sales, translation, road directions, exploration of the universe exploration and beyond...
The basics of data science are statistics and data visualization. Let's start by understanding descriptive stats and data visualization:
In 1989 Ackoff demonstrated how data contributes to building knowledge by defining a hierarchy: "data - information - knowledge".
Statistics tools are very powerful for building knowledge out of data exploration. Statistics is used to aggregate observations on subjects sharing the same property. Mathematic formulas are applied to those observations to generate data on this property.
For example, Netflix stores observations on TV shows searched for or watched by each user. This allows them to identify a user’s interests. The good news is that with Netflix you need never miss a good TV show again :).
Nowadays, descriptive statistics are used to generate information in the form of tables, graphs, charts, and so on. Large amount of data represented via charts are easier to read. A data scientist will use those graphs to better understand the system he/she analyses.
Mathematics models applied to statistics will allow the data scientist to build predictions or suggestions. So the next, most interesting step is to build predictions through application of artificial intelligence.
In summary, the type of questions that the data science seeks to solve:
- What are the habits of the users of a system?
- What are the successive transformations of a given data over a period of time?
- Is there a pattern in my business activity?
- What is the probability that a given event occurs?
- How much a product will cost in the coming week?
To "guess" the answer to these questions data scientists gather a large amount of data produced by the system, and apply mathematics formula to extract information in order to gain knowledge on the business activity. To guess if it is reasonable to think that an event will occur, the idea is to look at all the data, identify the preconditions, and check if they are currently met.
Business Process Management (BPM)
The definition of a business process is based on a precise knowledge of the business and the organization. Using a BPM-based platform, people in charge of automating a process have full control over WHO must execute WHAT tasks and WHEN. The way users interact with the process are included in the process definition. Business data generated through a process are also clearly identified in the BPMN diagram model. The sequence of events is is also part of the process definition.
In short, BPM aims at guaranteeing that users will perform tasks and update data in a pre-defined order, and often within a pre-defined time limit or deadline. Business rules are enforced by the process definition.
Data science applied to BPM
As BPM provides a constrained workflow for user activities (habits), automates data transformation and ensures that actions are made in a pre-defined order, can data science answer specific questions from data generated by a BPM application?
The R&D team at Bonitasoft wants to confirm its idea that these techniques could help Bonita BPM users to gain knowledge about their processes.
Who would not be interested in improving maintainability, growth, efficiency and conformance to business goals and organizational constraints? Moreover who would not be interested in tooling to help do this?
Good news! That is what our team is working on!
We are currently focused on statistics coming from data available to us.
One of the major difficulties we are facing is the heterogeneity of the data Bonita BPM will have to analyze. Every project using Bonita BPM is different, and can belong to different business verticals (e.g. e-learning, banking, eductional, manufacture, and so on).
To explore data mining possibilities further, we are looking for more data coming from real environments out of existing projects. This data can help us create powerful algorithms that are applicable to specific business verticals.
Help us build the new capabilities in Bonita BPM to make predictions on your business, and maybe even provide you with useful suggestions on how to improve your processes!
It’s actually pretty easy to participate, and your project remains anonymous with no sensitive information shared. Here’s how it works:
Execute the following query on your database. Only archived tasks, without names, are retrieved, so no sensitive information is included.
fni.ARCHIVEDATE AS STEP_COMPLETION_DATE,
fni.LOGICALGROUP1 AS PROCESS_DEFINITION_ID,
fni.FLOWNODEDEFINITIONID AS STEP_ID,
fni.ROOTCONTAINERID AS CASE_ID,
fni.EXECUTEDBYSUBSTITUTE AS STEP_EXECUTED_BY_SUBSTITUTE,
fni.EXECUTEDBY AS STEP_EXECUTED_BY,
fni.ASSIGNEEID AS STEP_ASSIGNEE_ID,
fni.CLAIMEDDATE AS STEP_CLAIM_DATE,
fni.EXPECTEDENDDATE AS STEP_EXPECTED_END_DATE,
fni.TENANTID AS TENANT_ID,
pi.STARTEDBYSUBSTITUTE AS CASE_STARTED_BY_SUBSTITUTE,
pi.STARTEDBY AS CASE_STARTED_BY,
pi.STARTDATE AS CASE_START_DATE,
pi.ENDDATE AS CASE_END_DATE
ENDDATE >= 1451606400
AND ROOTPROCESSINSTANCEID = SOURCEOBJECTID
ORDER BY ID DESC
fni.ROOTCONTAINERID = pi.ROOTPROCESSINSTANCEID
AND fni.TERMINAL = 1
Then send it back to us at email@example.com.
Watch for our next article, when the team has intermediate results to share on our progress thanks to your contribution.
Stay tuned, and have fun with Bonita!