Deacto Basic Notebook

[1]:

%reload_ext autoreload
%autoreload 2

Set project directory

[2]:

import os

DEA_SYS_PROJECT_NAME = 'airlines'
notebook_dir = os.getcwd()
DEA_SYS_PROJ_DIR = os.path.join(notebook_dir, DEA_SYS_PROJECT_NAME)

Data description

Feature set :

Gender: Gender of the passengers (Female, Male)
Customer Type: The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance: The flight distance of this journey
Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient
Ease of Online booking: Satisfaction level of online booking
Gate location: Satisfaction level of Gate location
Food and drink: Satisfaction level of Food and drink
Online boarding: Satisfaction level of online boarding
Seat comfort: Satisfaction level of Seat comfort
Inflight entertainment: Satisfaction level of inflight entertainment
On-board service: Satisfaction level of On-board service
Leg room service: Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes: Minutes delayed when departure
Arrival Delay in Minutes: Minutes delayed when Arrival

Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction

Read data

We use some subset of input features

[3]:

import pandas as pd

airlines_data = pd.read_csv(os.path.join(DEA_SYS_PROJ_DIR, 'airlines_data.csv'))

print(f"Number of customers : {airlines_data.shape[0]}")
print("Features:\n" + "\n".join([f for f in airlines_data.columns]))

Number of customers : 99362
Features:
id
Customer_Type
Type_of_Travel
Class
Gender
Age
Ease_of_Online_booking
Online_boarding
Inflight_service
Inflight_entertainment
satisfaction

User config specification

DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - project name , matches project’s folder name
DEA_EXPERIMENT_ID - separates between Deacto runs with different set of config values
DEA_SYS_OUTPUT_FILE - indication (True/False) to write output table to .csv file
DEA_USER_CONFIG
Data
- bo_id - column for business object identifier . Use None if a column doesn’t exist
- target_feature - the column in data presents business object state
Dbact
- input_bov - function maps from object state to value for company
- input_nafs - the subset of input feature sets that are out of control under control and cannot be changed
- input_costs - set of functions that map between each possible business action to money value (cost) for such change for one business object

Experiment 1 - 0/1 busines object value + zero action costs

Set user config

Deacto uses set of parameters are required to be provided by user in YAML format This YAML file can be prepared in advance and saved in project directory. Here we provide the procedure to prepare the file inside the notebook.

[4]:

def write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str):
    import yaml

    config_file = os.path.join(DEA_SYS_PROJ_DIR,'deacto_user_config.yaml')
    with open(config_file, 'w') as file:
        yaml.dump(yaml.safe_load(config_yaml_str), file)

[5]:

config_input_costs = {}

deacto_user_config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
  DEA_SYS_PROJECT_NAME: airlines
  DEA_EXPERIMENT_ID: Experiment 1
  DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
  Data:
    bo_id: id
    target_feature: satisfaction
  Dbact:
    input_bov:
      'satisfied': 1
      'neutral or dissatisfied': 0
    input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
    input_costs: {config_input_costs}
""")

write_deacto_user_config(DEA_SYS_PROJ_DIR, deacto_user_config_yaml_str)

User config description

DEA_SYS_USER_CONFIG
DEA_SYS_PROJECT_NAME - airlines - corresponds to dataset topic
DEA_EXPERIMENT_ID - experiment 1 - simple run order
DEA_SYS_OUTPUT_FILE - True
DEA_USER_CONFIG
Data
- bo_id - id , data already contains business object identifier when object of data is company customer
- target_feature - Satisfied , which has two possible values
Dbact
- input_bov - value of 1 for customer that ‘Satisfied’ and value of 0 for customer that ‘neutral or dissatisfied’ . 0/1 customer values
- input_nafs - [‘Customer_Type’, ‘Type_of_Travel’, ‘Class’, ‘Gender’, ‘Age’] - features that relates to customer personal or company attributes and cannot be changed
- input_costs - Empty . Zero costs

Initialize Deacto

[6]:

from deacto import deacto

deacto_analysis = deacto.Deacto(
    DEA_SYS_PROJ_DIR,
    airlines_data
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 4
      1 # from deacto import deacto
      2 import deacto
----> 4 deacto_analysis = Deacto(
      5     DEA_SYS_PROJ_DIR,
      6     airlines_data
      7 )

NameError: name 'Deacto' is not defined

Perform deacto analysis

[ ]:

dbact_output_table = deacto_analysis.perform_deacto_analysis()

airlines - 2023-04-07 13:05:42 : INFO     : Starting Deacto for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO     : Starting Deacto data preparation for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO     : Input data validation finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:42 : INFO     : Input data validation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:42 : DEBUG    : Starting OutlierHandler for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : DEBUG    : Function: OutlierHandler.fit_transform, Time: 0.01 sec
airlines - 2023-04-07 13:05:42 : DEBUG    : OutlierHandler for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:43 : DEBUG    : Starting Discretization for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG    : Function: Discretization.fit_transform, Time: 1.98 sec
airlines - 2023-04-07 13:05:45 : DEBUG    : Discretization for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:45 : DEBUG    : Starting CategoricalEncoding for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG    : Function: CategoricalEncoding.fit_transform, Time: 0.77 sec
airlines - 2023-04-07 13:05:45 : DEBUG    : CategoricalEncoding for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : DEBUG    : Starting FeatureSelection for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG    : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:46 : DEBUG    : FeatureSelection for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO     : Deacto data preparation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO     : Deacto data prep finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:46 : INFO     : Starting Deacto modeling for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG    : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:47 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 38571.1
airlines - 2023-04-07 13:05:47 : DEBUG    : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:05:49 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 3, utility : 53525.1
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:05:53 : DEBUG    : find_best_split - feature_name : Ease_of_Online_booking, cutoff : 4, utility : 96647.5
airlines - 2023-04-07 13:05:53 : DEBUG    : depth : 2 , direction left , input_path ['right', 'left']
airlines - 2023-04-07 13:05:53 : DEBUG    : depth : 2 , direction right , input_path ['right', 'right']
airlines - 2023-04-07 13:05:53 : DEBUG    : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 7.15 sec
airlines - 2023-04-07 13:05:53 : INFO     : Deacto modeling finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO     : Starting Deacto business actions extraction for 'Experiment 1' ...
airlines - 2023-04-07 13:05:53 : INFO     : Deacto business actions extraction finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO     : Deacto finished successfully for 'Experiment 1'.

Deacto output table exploration

[ ]:

dbact_output_table

	pre_condition	set	change	benefit	utility	ROI	utility_pct
0	Online_boarding : [1, 2, 3]	Ease_of_Online_booking : [5]	Online_boarding : 1 -> [5] , Online_boarding :...	36317.4	36317.4	inf	79.4
1	Online_boarding : [4]	Ease_of_Online_booking : [5]	Online_boarding : 4 -> [5]	8448.7	8448.7	inf	18.5
2	Online_boarding : [5] , Ease_of_Online_booking...		Ease_of_Online_booking : 1 -> [5] , Ease_of_On...	984.2	984.2	inf	2.2

Deacto output table specification

Each row describes one action set, its description and evaluation metrics.

Business can decide to act based on one or more action sets in the table.

Columns and their description :

Action description :
- pre_condition - defines the subpopulation of objects on which the action is applied
- change - actions to take when the action assumes feature changing from source to target value
- set - actions to take when the action assumes setting feature to specific (target) value regardless of source value
Action evaluation :
- benefit - added value from the action set
- costs - overall cost of action set
- utility - lower bound of utility
- ROI - lower bound of ROI , utility-to-costs ratio (%)
- utility_pct - ratio of the action set utility to total utility (%)

Experiment 1 output table exploration

First action set achieves ~ 79.4% (utility_pct = 79.4%) of possible benefit and should be examined.

[ ]:

dbact_output_table.loc[0].to_dict()

{'pre_condition': 'Online_boarding : [1, 2, 3]',
 'set': 'Ease_of_Online_booking : [5]',
 'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5]',
 'benefit': 36317.4,
 'costs': 0,
 'utility': 36317.4,
 'ROI': inf,
 'utility_pct': 79.4}

Experiment 1 - Conclusions

Online_Boarding and Ease_of_Online_booking satisfactory levels are important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) and Online_Boarding is proposed to be increased from level 1, 2 and 3 to to highest satisfactory level (5).
The actions make sense in absence of costs. The expected benefit is ~ 36,000 which corresponding to number of customers that will with the company in case of 0/1 benefits and zero costs.
Assumed that company has an ability to impact directly satisfactory levels above

Experiment 2 - 0/1 busines object value + some action costs

Let consider more realistic scenario when improving satisfactory level for ‘Ease_of_Online_booking’ and ‘Online_Boarding’ costs liniearly inceraasing on satisfactory level.

We put extremely high costs for decreasing satisfactory levels - the change isn’t acceptable.

The costs are for each customer presenting variable costs.

We also increase to 5 value for customer in desired state.

Set input costs

[ ]:

config_input_costs = {}

for f in ['Ease_of_Online_booking','Online_boarding','Inflight_entertainment','Inflight_service']:
    f_domain = sorted(airlines_data[f].unique())
    i_costs  = {str((i,j)):  (j - i) if j >= i else 1000 for i in f_domain for j in f_domain}
    config_input_costs[f] = i_costs

print(f"'Online_boarding' changing from 2 to 5 costs is : {config_input_costs['Online_boarding']['(2, 5)']}")
print(f"'Ease_of_Online_booking' changing from 3 to 2 costs is : {config_input_costs['Online_boarding']['(3, 2)']}")

'Online_boarding' changing from 2 to 5 costs is : 3
'Ease_of_Online_booking' changing from 3 to 2 costs is : 1000

Set user config

[ ]:

config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
  DEA_SYS_PROJECT_NAME: airlines
  DEA_EXPERIMENT_ID: Experiment 2
  DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
  Data:
    bo_id: id
    target_feature: satisfaction
  Dbact:
    input_bov:
      'satisfied': 5
      'neutral or dissatisfied': 0
    input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
    input_costs: {config_input_costs}
""")

write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str)

Run Deacto

[ ]:

from deacto import deacto

deacto_analysis = deacto.Deacto(
    DEA_SYS_PROJ_DIR,
    airlines_data
)

dbact_output_table = deacto_analysis.perform_deacto_analysis()

airlines - 2023-04-07 13:05:54 : INFO     : Deacto configuration finished successfully .
airlines - 2023-04-07 13:05:54 : INFO     : Starting Deacto for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO     : Starting Deacto data preparation for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO     : Input data validation finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:54 : INFO     : Input data validation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG    : Starting OutlierHandler for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : DEBUG    : Function: OutlierHandler.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:54 : DEBUG    : OutlierHandler for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG    : Starting Discretization for 'Experiment 2' ...
airlines - 2023-04-07 13:05:55 : DEBUG    : Function: Discretization.fit_transform, Time: 1.21 sec
airlines - 2023-04-07 13:05:55 : DEBUG    : Discretization for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:56 : DEBUG    : Starting CategoricalEncoding for 'Experiment 2' ...
airlines - 2023-04-07 13:05:56 : DEBUG    : Function: CategoricalEncoding.fit_transform, Time: 0.8 sec
airlines - 2023-04-07 13:05:56 : DEBUG    : CategoricalEncoding for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : DEBUG    : Starting FeatureSelection for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG    : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:57 : DEBUG    : FeatureSelection for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO     : Deacto data preparation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO     : Deacto data prep finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:57 : INFO     : Starting Deacto modeling for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG    : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:58 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 25836.3
airlines - 2023-04-07 13:05:58 : DEBUG    : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:06:03 : DEBUG    : find_best_split - feature_name : Class, cutoff : 1, utility : 40715.3
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:06:21 : DEBUG    : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 24.09 sec
airlines - 2023-04-07 13:06:21 : INFO     : Deacto modeling finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO     : Starting Deacto business actions extraction for 'Experiment 2' ...
airlines - 2023-04-07 13:06:21 : INFO     : Deacto business actions extraction finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO     : Deacto finished successfully for 'Experiment 2'.

Deacto output table exploration

[ ]:

dbact_output_table

	pre_condition	set	change	benefit	costs	utility	ROI	utility_pct
0	Online_boarding : [1, 2, 3, 4] , Class : ['Eco...		Online_boarding : 1 -> [5] , Online_boarding :...	148488.3	107773.0	40715.3	37.8	100.0

Experiment 2 output table exploration

There is only one action set achieves possible benefit and should be examined.

[ ]:

dbact_output_table.loc[0].to_dict()

{'pre_condition': "Online_boarding : [1, 2, 3, 4] , Class : ['Eco', 'Eco Plus']",
 'set': '',
 'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5] , Online_boarding : 4 -> [5]',
 'benefit': 148488.3,
 'costs': 107773.0,
 'utility': 40715.3,
 'ROI': 37.8,
 'utility_pct': 100.0}

Experiment 2 - Conclusions

Online_Boarding satisfactory level is most important to for action discovery
The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) for ‘Eco’ and ‘Eco Plus’ Class
While it makes sense that customers would be satisfied with “Busienss” class service, to optimize benefits for the company, company it is recommended to prioritize effort on online boarding.