Deacto Basic Notebook
[1]:
%reload_ext autoreload
%autoreload 2
Set project directory
[2]:
import os
DEA_SYS_PROJECT_NAME = 'airlines'
notebook_dir = os.getcwd()
DEA_SYS_PROJ_DIR = os.path.join(notebook_dir, DEA_SYS_PROJECT_NAME)
Data description
Feature set :
Gender: Gender of the passengers (Female, Male)
Customer Type: The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance: The flight distance of this journey
Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient
Ease of Online booking: Satisfaction level of online booking
Gate location: Satisfaction level of Gate location
Food and drink: Satisfaction level of Food and drink
Online boarding: Satisfaction level of online boarding
Seat comfort: Satisfaction level of Seat comfort
Inflight entertainment: Satisfaction level of inflight entertainment
On-board service: Satisfaction level of On-board service
Leg room service: Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes: Minutes delayed when departure
Arrival Delay in Minutes: Minutes delayed when Arrival
Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Read data
We use some subset of input features
[3]:
import pandas as pd
airlines_data = pd.read_csv(os.path.join(DEA_SYS_PROJ_DIR, 'airlines_data.csv'))
print(f"Number of customers : {airlines_data.shape[0]}")
print("Features:\n" + "\n".join([f for f in airlines_data.columns]))
Number of customers : 99362
Features:
id
Customer_Type
Type_of_Travel
Class
Gender
Age
Ease_of_Online_booking
Online_boarding
Inflight_service
Inflight_entertainment
satisfaction
User config specification
DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - project name , matches project’s folder name
DEA_EXPERIMENT_ID - separates between Deacto runs with different set of config values
DEA_SYS_OUTPUT_FILE - indication (True/False) to write output table to .csv file
DEA_USER_CONFIG
Data
bo_id - column for business object identifier . Use None if a column doesn’t exist
target_feature - the column in data presents business object state
Dbact
input_bov - function maps from object state to value for company
input_nafs - the subset of input feature sets that are out of control under control and cannot be changed
input_costs - set of functions that map between each possible business action to money value (cost) for such change for one business object
Experiment 1 - 0/1 busines object value + zero action costs
Set user config
Deacto uses set of parameters are required to be provided by user in YAML format This YAML file can be prepared in advance and saved in project directory. Here we provide the procedure to prepare the file inside the notebook.
[4]:
def write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str):
import yaml
config_file = os.path.join(DEA_SYS_PROJ_DIR,'deacto_user_config.yaml')
with open(config_file, 'w') as file:
yaml.dump(yaml.safe_load(config_yaml_str), file)
[5]:
config_input_costs = {}
deacto_user_config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
DEA_SYS_PROJECT_NAME: airlines
DEA_EXPERIMENT_ID: Experiment 1
DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
Data:
bo_id: id
target_feature: satisfaction
Dbact:
input_bov:
'satisfied': 1
'neutral or dissatisfied': 0
input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
input_costs: {config_input_costs}
""")
write_deacto_user_config(DEA_SYS_PROJ_DIR, deacto_user_config_yaml_str)
User config description
DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - airlines - corresponds to dataset topic
DEA_EXPERIMENT_ID - experiment 1 - simple run order
DEA_SYS_OUTPUT_FILE - True
DEA_USER_CONFIG
Data
bo_id - id , data already contains business object identifier when object of data is company customer
target_feature - Satisfied , which has two possible values
Dbact
input_bov - value of 1 for customer that ‘Satisfied’ and value of 0 for customer that ‘neutral or dissatisfied’ . 0/1 customer values
input_nafs - [‘Customer_Type’, ‘Type_of_Travel’, ‘Class’, ‘Gender’, ‘Age’] - features that relates to customer personal or company attributes and cannot be changed
input_costs - Empty . Zero costs
Initialize Deacto
[6]:
from deacto import deacto
deacto_analysis = deacto.Deacto(
DEA_SYS_PROJ_DIR,
airlines_data
)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 4
1 # from deacto import deacto
2 import deacto
----> 4 deacto_analysis = Deacto(
5 DEA_SYS_PROJ_DIR,
6 airlines_data
7 )
NameError: name 'Deacto' is not defined
Perform deacto analysis
[ ]:
dbact_output_table = deacto_analysis.perform_deacto_analysis()
airlines - 2023-04-07 13:05:42 : INFO : Starting Deacto for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO : Starting Deacto data preparation for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO : Input data validation finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:42 : INFO : Input data validation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:42 : DEBUG : Starting OutlierHandler for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : DEBUG : Function: OutlierHandler.fit_transform, Time: 0.01 sec
airlines - 2023-04-07 13:05:42 : DEBUG : OutlierHandler for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:43 : DEBUG : Starting Discretization for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG : Function: Discretization.fit_transform, Time: 1.98 sec
airlines - 2023-04-07 13:05:45 : DEBUG : Discretization for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:45 : DEBUG : Starting CategoricalEncoding for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG : Function: CategoricalEncoding.fit_transform, Time: 0.77 sec
airlines - 2023-04-07 13:05:45 : DEBUG : CategoricalEncoding for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : DEBUG : Starting FeatureSelection for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:46 : DEBUG : FeatureSelection for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO : Deacto data preparation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO : Deacto data prep finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:46 : INFO : Starting Deacto modeling for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:47 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 38571.1
airlines - 2023-04-07 13:05:47 : DEBUG : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:05:49 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 3, utility : 53525.1
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:05:49 : DEBUG : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:05:53 : DEBUG : find_best_split - feature_name : Ease_of_Online_booking, cutoff : 4, utility : 96647.5
airlines - 2023-04-07 13:05:53 : DEBUG : depth : 2 , direction left , input_path ['right', 'left']
airlines - 2023-04-07 13:05:53 : DEBUG : depth : 2 , direction right , input_path ['right', 'right']
airlines - 2023-04-07 13:05:53 : DEBUG : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 7.15 sec
airlines - 2023-04-07 13:05:53 : INFO : Deacto modeling finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO : Starting Deacto business actions extraction for 'Experiment 1' ...
airlines - 2023-04-07 13:05:53 : INFO : Deacto business actions extraction finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO : Deacto finished successfully for 'Experiment 1'.
Deacto output table exploration
[ ]:
dbact_output_table
pre_condition | set | change | benefit | costs | utility | ROI | utility_pct | |
---|---|---|---|---|---|---|---|---|
0 | Online_boarding : [1, 2, 3] | Ease_of_Online_booking : [5] | Online_boarding : 1 -> [5] , Online_boarding :... | 36317.4 | 0 | 36317.4 | inf | 79.4 |
1 | Online_boarding : [4] | Ease_of_Online_booking : [5] | Online_boarding : 4 -> [5] | 8448.7 | 0 | 8448.7 | inf | 18.5 |
2 | Online_boarding : [5] , Ease_of_Online_booking... | Ease_of_Online_booking : 1 -> [5] , Ease_of_On... | 984.2 | 0 | 984.2 | inf | 2.2 |
Deacto output table specification
Each row describes one action set, its description and evaluation metrics.
Business can decide to act based on one or more action sets in the table.
Columns and their description :
Action description :
pre_condition - defines the subpopulation of objects on which the action is applied
change - actions to take when the action assumes feature changing from source to target value
set - actions to take when the action assumes setting feature to specific (target) value regardless of source value
Action evaluation :
benefit - added value from the action set
costs - overall cost of action set
utility - lower bound of utility
ROI - lower bound of ROI , utility-to-costs ratio (%)
utility_pct - ratio of the action set utility to total utility (%)
Experiment 1 output table exploration
First action set achieves ~ 79.4% (utility_pct = 79.4%) of possible benefit and should be examined.
[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': 'Online_boarding : [1, 2, 3]',
'set': 'Ease_of_Online_booking : [5]',
'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5]',
'benefit': 36317.4,
'costs': 0,
'utility': 36317.4,
'ROI': inf,
'utility_pct': 79.4}
Experiment 1 - Conclusions
Online_Boarding and Ease_of_Online_booking satisfactory levels are important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) and Online_Boarding is proposed to be increased from level 1, 2 and 3 to to highest satisfactory level (5).
The actions make sense in absence of costs. The expected benefit is ~ 36,000 which corresponding to number of customers that will with the company in case of 0/1 benefits and zero costs.
Assumed that company has an ability to impact directly satisfactory levels above
Experiment 2 - 0/1 busines object value + some action costs
Let consider more realistic scenario when improving satisfactory level for ‘Ease_of_Online_booking’ and ‘Online_Boarding’ costs liniearly inceraasing on satisfactory level.
We put extremely high costs for decreasing satisfactory levels - the change isn’t acceptable.
The costs are for each customer presenting variable costs.
We also increase to 5 value for customer in desired state.
Set input costs
[ ]:
config_input_costs = {}
for f in ['Ease_of_Online_booking','Online_boarding','Inflight_entertainment','Inflight_service']:
f_domain = sorted(airlines_data[f].unique())
i_costs = {str((i,j)): (j - i) if j >= i else 1000 for i in f_domain for j in f_domain}
config_input_costs[f] = i_costs
print(f"'Online_boarding' changing from 2 to 5 costs is : {config_input_costs['Online_boarding']['(2, 5)']}")
print(f"'Ease_of_Online_booking' changing from 3 to 2 costs is : {config_input_costs['Online_boarding']['(3, 2)']}")
'Online_boarding' changing from 2 to 5 costs is : 3
'Ease_of_Online_booking' changing from 3 to 2 costs is : 1000
Set user config
[ ]:
config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
DEA_SYS_PROJECT_NAME: airlines
DEA_EXPERIMENT_ID: Experiment 2
DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
Data:
bo_id: id
target_feature: satisfaction
Dbact:
input_bov:
'satisfied': 5
'neutral or dissatisfied': 0
input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
input_costs: {config_input_costs}
""")
write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str)
Run Deacto
[ ]:
from deacto import deacto
deacto_analysis = deacto.Deacto(
DEA_SYS_PROJ_DIR,
airlines_data
)
dbact_output_table = deacto_analysis.perform_deacto_analysis()
airlines - 2023-04-07 13:05:54 : INFO : Deacto configuration finished successfully .
airlines - 2023-04-07 13:05:54 : INFO : Starting Deacto for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO : Starting Deacto data preparation for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO : Input data validation finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:54 : INFO : Input data validation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG : Starting OutlierHandler for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : DEBUG : Function: OutlierHandler.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:54 : DEBUG : OutlierHandler for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG : Starting Discretization for 'Experiment 2' ...
airlines - 2023-04-07 13:05:55 : DEBUG : Function: Discretization.fit_transform, Time: 1.21 sec
airlines - 2023-04-07 13:05:55 : DEBUG : Discretization for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:56 : DEBUG : Starting CategoricalEncoding for 'Experiment 2' ...
airlines - 2023-04-07 13:05:56 : DEBUG : Function: CategoricalEncoding.fit_transform, Time: 0.8 sec
airlines - 2023-04-07 13:05:56 : DEBUG : CategoricalEncoding for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : DEBUG : Starting FeatureSelection for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:57 : DEBUG : FeatureSelection for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO : Deacto data preparation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO : Deacto data prep finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:57 : INFO : Starting Deacto modeling for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:58 : DEBUG : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 25836.3
airlines - 2023-04-07 13:05:58 : DEBUG : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:06:03 : DEBUG : find_best_split - feature_name : Class, cutoff : 1, utility : 40715.3
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:06:03 : DEBUG : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:06:21 : DEBUG : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 24.09 sec
airlines - 2023-04-07 13:06:21 : INFO : Deacto modeling finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO : Starting Deacto business actions extraction for 'Experiment 2' ...
airlines - 2023-04-07 13:06:21 : INFO : Deacto business actions extraction finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO : Deacto finished successfully for 'Experiment 2'.
Deacto output table exploration
[ ]:
dbact_output_table
pre_condition | set | change | benefit | costs | utility | ROI | utility_pct | |
---|---|---|---|---|---|---|---|---|
0 | Online_boarding : [1, 2, 3, 4] , Class : ['Eco... | Online_boarding : 1 -> [5] , Online_boarding :... | 148488.3 | 107773.0 | 40715.3 | 37.8 | 100.0 |
Experiment 2 output table exploration
There is only one action set achieves possible benefit and should be examined.
[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': "Online_boarding : [1, 2, 3, 4] , Class : ['Eco', 'Eco Plus']",
'set': '',
'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5] , Online_boarding : 4 -> [5]',
'benefit': 148488.3,
'costs': 107773.0,
'utility': 40715.3,
'ROI': 37.8,
'utility_pct': 100.0}
Experiment 2 - Conclusions
Online_Boarding satisfactory level is most important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) for ‘Eco’ and ‘Eco Plus’ Class
While it makes sense that customers would be satisfied with “Busienss” class service, to optimize benefits for the company, company it is recommended to prioritize effort on online boarding.