Welcome to Deacto!
What is Deacto
Deacto (from the Latin ‘concerning’ + actionability) aims to help business to take a proactive approach to problem-solving and developed to support actionable analytics.
Actionable analytics is a broad range of business analytics and data science processes, methods and tools to create business confidence to act upon analytics, which results in performance improvement.
Deacto directly models the actionability based on data and provided domain business knowledge.
Ultimately, the goal of Deacto is to provide a straightforward answer to an intuitive decision-making question: “How should the business act?”
Methodology
This section describes the basic ideas and notation that Deacto uses to perform actionable analytics. For more information, refer to the notebook that provides a basic illustration of Deacto on a real-world dataset.
Data
There is a dataset containing a collection of some objects a company is interested in using their data to its advantage. There are \(N\) instances in data. Each instance represents one business object and consists of a unique identifier, \(M\) input features and target feature (\(y\)).
Mathematically,
where:
\(F = \ \left\{ F_{1},\ldots,F_{m},\ \ldots,\ F_{M} \right\}\) – input feature set
\(D(F_{m})\) – input feature \(m\) domain
\(X_{n} \in {D(F}_{1}) \times \ldots\ \times D(F_{m}) \times \ldots,{D(F}_{M})\) – instance defined on set of possible instances
\(x(m,n)\) - the value of \(m\) feature in instance \(n\)
\(y_{n} \in D(y)\) – the value of target feature in instance \(n\) , when 2 values are possible: \(|D(y)| = 2\)
Business Object Value
Target feature value in data presents some object state. Each object’s state value is mapped to the potential value of object for company.
Mathematically,
\({BOV}_{i} = \ \left\{ {BOS}_{i}:value \right\}\)
\({where\ \ BOV}_{i}\ \epsilon\ R\) , \(\forall i\)
For example, in customer retention domain, defining value requires mapping between whether customer stays or leave to money value a company would benefit in each case.
One simple example fir such function as follows:
when value for “Stay” is 100 which calculated as sum of potential profits from customer if it would stay with the company. In case a customer leaves the company, the benefit is 0.
Actions
The aim is to identify a set of actions that can transform business objects from a less valuable state to a more valuable one while considering potential costs. This will result in the optimization of expected utility. The subset of input feature sets that are under control and can be modified to formulate an action is called the actionable feature set (AFS), while the subset of input feature sets that are out of control and cannot be changed is called the non-actionable feature set (NAFS).
An action is presented for given feature \({AFS}_{i\_ afs}\) in \(AFS\) by changing the feature from source value to target value when both are defined on the feature domain \(D({AFS}_{i\_ afs})\).
Mathematically,
Costs
Costs are presented by a set of functions that map between each possible business action to money value (cost) for such change for one business object.
Mathematically,
For example, in the customer retention domain, data can contain information about different discounts and how it impacts customer retention. Giving a discount to customers has a cost for business but can make retention effort more effective bringing a benefit for customers that would stay with a company.
If we want to consider granting a discount of type A (discount_A_flg) we’d formulate a cost function as :
This cost function defines a cost of 20 if the algorithm considers granting a discount for some customer. Since there is no possibility to cancel discount, cost function maps an alternative change to infinite cost.
The cost of action for given business object:
Overall cost of action:
Utility
Represents money value of expected added benefit for a set of actions while considering benefits and costs.
For some action and business object:
For example, for given value and cost above, let assume granting a discount for customer \(n\) increases his probability to \(c^{*} = Stay\ \)from 0.4 to 0.8 and this is the only action we consider. The expected utility as follows:
Overall utility of action:
Ultimately, our goal is to discover a set of actions that can yield the maximum overall utility for the business:
\({Actions}^{*} = \ \left\{ A_{1},\ldots,A_{{i\_ action}^{*}},\ldots,\ A_{{n\_ action}^{*}} \right\}\)
\({where\ \ A}_{{i\_ action}^{*}}\ \ \epsilon\ Actions\) , \(\forall i\)_action
This set optimal of actions is presented to user while they are detailly described, evaluated, and ranked.
References
[1] T. H. Davenport, J. G. Harris, and R. Morison, Analytics at Work: Smarter Decisions, Better Results. Harvard Business Press, 2010.
[2] L. Cao, “Actionable Knowledge Discovery and Delivery”, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 2, March 2012, pp. 149-163, 2012.
Notebooks
Deacto Basic Notebook
[1]:
%reload_ext autoreload
%autoreload 2
Set project directory
[2]:
import os
DEA_SYS_PROJECT_NAME = 'airlines'
notebook_dir = os.getcwd()
DEA_SYS_PROJ_DIR = os.path.join(notebook_dir, DEA_SYS_PROJECT_NAME)
Data description
Feature set :
Gender: Gender of the passengers (Female, Male)
Customer Type: The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance: The flight distance of this journey
Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient
Ease of Online booking: Satisfaction level of online booking
Gate location: Satisfaction level of Gate location
Food and drink: Satisfaction level of Food and drink
Online boarding: Satisfaction level of online boarding
Seat comfort: Satisfaction level of Seat comfort
Inflight entertainment: Satisfaction level of inflight entertainment
On-board service: Satisfaction level of On-board service
Leg room service: Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes: Minutes delayed when departure
Arrival Delay in Minutes: Minutes delayed when Arrival
Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Read data
We use some subset of input features
[3]:
import pandas as pd
airlines_data = pd.read_csv(os.path.join(DEA_SYS_PROJ_DIR, 'airlines_data.csv'))
print(f"Number of customers : {airlines_data.shape[0]}")
print("Features:\n" + "\n".join([f for f in airlines_data.columns]))
Number of customers : 99362
Features:
id
Customer_Type
Type_of_Travel
Class
Gender
Age
Ease_of_Online_booking
Online_boarding
Inflight_service
Inflight_entertainment
satisfaction
User config specification
DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - project name , matches project’s folder name
DEA_EXPERIMENT_ID - separates between Deacto runs with different set of config values
DEA_SYS_OUTPUT_FILE - indication (True/False) to write output table to .csv file
DEA_USER_CONFIG
Data
bo_id - column for business object identifier . Use None if a column doesn’t exist
target_feature - the column in data presents business object state
Dbact
input_bov - function maps from object state to value for company
input_nafs - the subset of input feature sets that are out of control under control and cannot be changed
input_costs - set of functions that map between each possible business action to money value (cost) for such change for one business object
Experiment 1 - 0/1 busines object value + zero action costs
Set user config
Deacto uses set of parameters are required to be provided by user in YAML format This YAML file can be prepared in advance and saved in project directory. Here we provide the procedure to prepare the file inside the notebook.
[4]:
def write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str):
import yaml
config_file = os.path.join(DEA_SYS_PROJ_DIR,'deacto_user_config.yaml')
with open(config_file, 'w') as file:
yaml.dump(yaml.safe_load(config_yaml_str), file)
[5]:
config_input_costs = {}
deacto_user_config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
DEA_SYS_PROJECT_NAME: airlines
DEA_EXPERIMENT_ID: Experiment 1
DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
Data:
bo_id: id
target_feature: satisfaction
Dbact:
input_bov:
'satisfied': 1
'neutral or dissatisfied': 0
input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
input_costs: {config_input_costs}
""")
write_deacto_user_config(DEA_SYS_PROJ_DIR, deacto_user_config_yaml_str)
User config description
DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - airlines - corresponds to dataset topic
DEA_EXPERIMENT_ID - experiment 1 - simple run order
DEA_SYS_OUTPUT_FILE - True
DEA_USER_CONFIG
Data
bo_id - id , data already contains business object identifier when object of data is company customer
target_feature - Satisfied , which has two possible values
Dbact
input_bov - value of 1 for customer that ‘Satisfied’ and value of 0 for customer that ‘neutral or dissatisfied’ . 0/1 customer values
input_nafs - [‘Customer_Type’, ‘Type_of_Travel’, ‘Class’, ‘Gender’, ‘Age’] - features that relates to customer personal or company attributes and cannot be changed
input_costs - Empty . Zero costs
Initialize Deacto
[6]:
from deacto import deacto
deacto_analysis = deacto.Deacto(
DEA_SYS_PROJ_DIR,
airlines_data
)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 4
1 # from deacto import deacto
2 import deacto
----> 4 deacto_analysis = Deacto(
5 DEA_SYS_PROJ_DIR,
6 airlines_data
7 )
NameError: name 'Deacto' is not defined
Perform deacto analysis
[ ]:
dbact_output_table = deacto_analysis.perform_deacto_analysis()
airlines - 2023-04-07 13:05:42 : INFO : Starting Deacto for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO : Starting Deacto data preparation for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO : Input data validation finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:42 : INFO : Input data validation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:42 : DEBUG : Starting OutlierHandler for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : DEBUG : Function: OutlierHandler.fit_transform, Time: 0.01 sec
airlines - 2023-04-07 13:05:42 : DEBUG : OutlierHandler for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:43 : DEBUG : Starting Discretization for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG : Function: Discretization.fit_transform, Time: 1.98 sec
airlines - 2023-04-07 13:05:45 : DEBUG : Discretization for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:45 : DEBUG : Starting CategoricalEncoding for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG : Function: CategoricalEncoding.fit_transform, Time: 0.77 sec
airlines - 2023-04-07 13:05:45 : DEBUG : CategoricalEncoding for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : DEBUG : Starting FeatureSelection for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:46 : DEBUG : FeatureSelection for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO : Deacto data preparation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO : Deacto data prep finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:46 : INFO : Starting Deacto modeling for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:47 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 38571.1
airlines - 2023-04-07 13:05:47 : DEBUG : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:05:49 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 3, utility : 53525.1
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:05:53 : DEBUG : find_best_split - feature_name : Ease_of_Online_booking, cutoff : 4, utility : 96647.5
airlines - 2023-04-07 13:05:53 : DEBUG : depth : 2 , direction left , input_path ['right', 'left']
airlines - 2023-04-07 13:05:53 : DEBUG : depth : 2 , direction right , input_path ['right', 'right']
airlines - 2023-04-07 13:05:53 : DEBUG : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 7.15 sec
airlines - 2023-04-07 13:05:53 : INFO : Deacto modeling finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO : Starting Deacto business actions extraction for 'Experiment 1' ...
airlines - 2023-04-07 13:05:53 : INFO : Deacto business actions extraction finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO : Deacto finished successfully for 'Experiment 1'.
Deacto output table exploration
[ ]:
dbact_output_table
pre_condition | set | change | benefit | costs | utility | ROI | utility_pct | |
---|---|---|---|---|---|---|---|---|
0 | Online_boarding : [1, 2, 3] | Ease_of_Online_booking : [5] | Online_boarding : 1 -> [5] , Online_boarding :... | 36317.4 | 0 | 36317.4 | inf | 79.4 |
1 | Online_boarding : [4] | Ease_of_Online_booking : [5] | Online_boarding : 4 -> [5] | 8448.7 | 0 | 8448.7 | inf | 18.5 |
2 | Online_boarding : [5] , Ease_of_Online_booking... | Ease_of_Online_booking : 1 -> [5] , Ease_of_On... | 984.2 | 0 | 984.2 | inf | 2.2 |
Deacto output table specification
Each row describes one action set, its description and evaluation metrics.
Business can decide to act based on one or more action sets in the table.
Columns and their description :
Action description :
pre_condition - defines the subpopulation of objects on which the action is applied
change - actions to take when the action assumes feature changing from source to target value
set - actions to take when the action assumes setting feature to specific (target) value regardless of source value
Action evaluation :
benefit - added value from the action set
costs - overall cost of action set
utility - lower bound of utility
ROI - lower bound of ROI , utility-to-costs ratio (%)
utility_pct - ratio of the action set utility to total utility (%)
Experiment 1 output table exploration
First action set achieves ~ 79.4% (utility_pct = 79.4%) of possible benefit and should be examined.
[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': 'Online_boarding : [1, 2, 3]',
'set': 'Ease_of_Online_booking : [5]',
'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5]',
'benefit': 36317.4,
'costs': 0,
'utility': 36317.4,
'ROI': inf,
'utility_pct': 79.4}
Experiment 1 - Conclusions
Online_Boarding and Ease_of_Online_booking satisfactory levels are important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) and Online_Boarding is proposed to be increased from level 1, 2 and 3 to to highest satisfactory level (5).
The actions make sense in absence of costs. The expected benefit is ~ 36,000 which corresponding to number of customers that will with the company in case of 0/1 benefits and zero costs.
Assumed that company has an ability to impact directly satisfactory levels above
Experiment 2 - 0/1 busines object value + some action costs
Let consider more realistic scenario when improving satisfactory level for ‘Ease_of_Online_booking’ and ‘Online_Boarding’ costs liniearly inceraasing on satisfactory level.
We put extremely high costs for decreasing satisfactory levels - the change isn’t acceptable.
The costs are for each customer presenting variable costs.
We also increase to 5 value for customer in desired state.
Set input costs
[ ]:
config_input_costs = {}
for f in ['Ease_of_Online_booking','Online_boarding','Inflight_entertainment','Inflight_service']:
f_domain = sorted(airlines_data[f].unique())
i_costs = {str((i,j)): (j - i) if j >= i else 1000 for i in f_domain for j in f_domain}
config_input_costs[f] = i_costs
print(f"'Online_boarding' changing from 2 to 5 costs is : {config_input_costs['Online_boarding']['(2, 5)']}")
print(f"'Ease_of_Online_booking' changing from 3 to 2 costs is : {config_input_costs['Online_boarding']['(3, 2)']}")
'Online_boarding' changing from 2 to 5 costs is : 3
'Ease_of_Online_booking' changing from 3 to 2 costs is : 1000
Set user config
[ ]:
config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
DEA_SYS_PROJECT_NAME: airlines
DEA_EXPERIMENT_ID: Experiment 2
DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
Data:
bo_id: id
target_feature: satisfaction
Dbact:
input_bov:
'satisfied': 5
'neutral or dissatisfied': 0
input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
input_costs: {config_input_costs}
""")
write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str)
Run Deacto
[ ]:
from deacto import deacto
deacto_analysis = deacto.Deacto(
DEA_SYS_PROJ_DIR,
airlines_data
)
dbact_output_table = deacto_analysis.perform_deacto_analysis()
airlines - 2023-04-07 13:05:54 : INFO : Deacto configuration finished successfully .
airlines - 2023-04-07 13:05:54 : INFO : Starting Deacto for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO : Starting Deacto data preparation for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO : Input data validation finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:54 : INFO : Input data validation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG : Starting OutlierHandler for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : DEBUG : Function: OutlierHandler.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:54 : DEBUG : OutlierHandler for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG : Starting Discretization for 'Experiment 2' ...
airlines - 2023-04-07 13:05:55 : DEBUG : Function: Discretization.fit_transform, Time: 1.21 sec
airlines - 2023-04-07 13:05:55 : DEBUG : Discretization for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:56 : DEBUG : Starting CategoricalEncoding for 'Experiment 2' ...
airlines - 2023-04-07 13:05:56 : DEBUG : Function: CategoricalEncoding.fit_transform, Time: 0.8 sec
airlines - 2023-04-07 13:05:56 : DEBUG : CategoricalEncoding for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : DEBUG : Starting FeatureSelection for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:57 : DEBUG : FeatureSelection for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO : Deacto data preparation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO : Deacto data prep finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:57 : INFO : Starting Deacto modeling for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:58 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 25836.3
airlines - 2023-04-07 13:05:58 : DEBUG : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:06:03 : DEBUG : find_best_split - feature_name : Class, cutoff : 1, utility : 40715.3
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:06:21 : DEBUG : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 24.09 sec
airlines - 2023-04-07 13:06:21 : INFO : Deacto modeling finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO : Starting Deacto business actions extraction for 'Experiment 2' ...
airlines - 2023-04-07 13:06:21 : INFO : Deacto business actions extraction finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO : Deacto finished successfully for 'Experiment 2'.
Deacto output table exploration
[ ]:
dbact_output_table
pre_condition | set | change | benefit | costs | utility | ROI | utility_pct | |
---|---|---|---|---|---|---|---|---|
0 | Online_boarding : [1, 2, 3, 4] , Class : ['Eco... | Online_boarding : 1 -> [5] , Online_boarding :... | 148488.3 | 107773.0 | 40715.3 | 37.8 | 100.0 |
Experiment 2 output table exploration
There is only one action set achieves possible benefit and should be examined.
[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': "Online_boarding : [1, 2, 3, 4] , Class : ['Eco', 'Eco Plus']",
'set': '',
'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5] , Online_boarding : 4 -> [5]',
'benefit': 148488.3,
'costs': 107773.0,
'utility': 40715.3,
'ROI': 37.8,
'utility_pct': 100.0}
Experiment 2 - Conclusions
Online_Boarding satisfactory level is most important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) for ‘Eco’ and ‘Eco Plus’ Class
While it makes sense that customers would be satisfied with “Busienss” class service, to optimize benefits for the company, company it is recommended to prioritize effort on online boarding.