Deacto Basic Notebook

[1]:
%reload_ext autoreload
%autoreload 2

Set project directory

[2]:
import os

DEA_SYS_PROJECT_NAME = 'airlines'
notebook_dir = os.getcwd()
DEA_SYS_PROJ_DIR = os.path.join(notebook_dir, DEA_SYS_PROJECT_NAME)

Data description

Feature set :

  1. Gender: Gender of the passengers (Female, Male)

  2. Customer Type: The customer type (Loyal customer, disloyal customer)

  3. Age: The actual age of the passengers

  4. Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)

  5. Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)

  6. Flight distance: The flight distance of this journey

  7. Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)

  8. Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient

  9. Ease of Online booking: Satisfaction level of online booking

  10. Gate location: Satisfaction level of Gate location

  11. Food and drink: Satisfaction level of Food and drink

  12. Online boarding: Satisfaction level of online boarding

  13. Seat comfort: Satisfaction level of Seat comfort

  14. Inflight entertainment: Satisfaction level of inflight entertainment

  15. On-board service: Satisfaction level of On-board service

  16. Leg room service: Satisfaction level of Leg room service

  17. Baggage handling: Satisfaction level of baggage handling

  18. Check-in service: Satisfaction level of Check-in service

  19. Inflight service: Satisfaction level of inflight service

  20. Cleanliness: Satisfaction level of Cleanliness

  21. Departure Delay in Minutes: Minutes delayed when departure

  22. Arrival Delay in Minutes: Minutes delayed when Arrival

Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction

Read data

We use some subset of input features

[3]:
import pandas as pd

airlines_data = pd.read_csv(os.path.join(DEA_SYS_PROJ_DIR, 'airlines_data.csv'))

print(f"Number of customers : {airlines_data.shape[0]}")
print("Features:\n" + "\n".join([f for f in airlines_data.columns]))
Number of customers : 99362
Features:
id
Customer_Type
Type_of_Travel
Class
Gender
Age
Ease_of_Online_booking
Online_boarding
Inflight_service
Inflight_entertainment
satisfaction

User config specification

  • DEA_SYS_USER_CONFIG

  • DEA_SYS_PROJECT_NAME - project name , matches project’s folder name

  • DEA_EXPERIMENT_ID - separates between Deacto runs with different set of config values

  • DEA_SYS_OUTPUT_FILE - indication (True/False) to write output table to .csv file

  • DEA_USER_CONFIG

  • Data

    • bo_id - column for business object identifier . Use None if a column doesn’t exist

    • target_feature - the column in data presents business object state

  • Dbact

    • input_bov - function maps from object state to value for company

    • input_nafs - the subset of input feature sets that are out of control under control and cannot be changed

    • input_costs - set of functions that map between each possible business action to money value (cost) for such change for one business object

Experiment 1 - 0/1 busines object value + zero action costs

Set user config

Deacto uses set of parameters are required to be provided by user in YAML format This YAML file can be prepared in advance and saved in project directory. Here we provide the procedure to prepare the file inside the notebook.

[4]:
def write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str):
    import yaml

    config_file = os.path.join(DEA_SYS_PROJ_DIR,'deacto_user_config.yaml')
    with open(config_file, 'w') as file:
        yaml.dump(yaml.safe_load(config_yaml_str), file)
[5]:
config_input_costs = {}

deacto_user_config_yaml_str = (f"""
DEA_SYS_USER_CONFIG:
  DEA_SYS_PROJECT_NAME: airlines
  DEA_EXPERIMENT_ID: Experiment 1
  DEA_SYS_OUTPUT_FILE : True
DEA_USER_CONFIG:
  Data:
    bo_id: id
    target_feature: satisfaction
  Dbact:
    input_bov:
      'satisfied': 1
      'neutral or dissatisfied': 0
    input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age']
    input_costs: {config_input_costs}
""")

write_deacto_user_config(DEA_SYS_PROJ_DIR, deacto_user_config_yaml_str)

User config description

  • DEA_SYS_USER_CONFIG

  • DEA_SYS_PROJECT_NAME - airlines - corresponds to dataset topic

  • DEA_EXPERIMENT_ID - experiment 1 - simple run order

  • DEA_SYS_OUTPUT_FILE - True

  • DEA_USER_CONFIG

  • Data

    • bo_id - id , data already contains business object identifier when object of data is company customer

    • target_feature - Satisfied , which has two possible values

  • Dbact

    • input_bov - value of 1 for customer that ‘Satisfied’ and value of 0 for customer that ‘neutral or dissatisfied’ . 0/1 customer values

    • input_nafs - [‘Customer_Type’, ‘Type_of_Travel’, ‘Class’, ‘Gender’, ‘Age’] - features that relates to customer personal or company attributes and cannot be changed

    • input_costs - Empty . Zero costs

Initialize Deacto

[6]:
from deacto import deacto

deacto_analysis = deacto.Deacto(
    DEA_SYS_PROJ_DIR,
    airlines_data
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 4
      1 # from deacto import deacto
      2 import deacto
----> 4 deacto_analysis = Deacto(
      5     DEA_SYS_PROJ_DIR,
      6     airlines_data
      7 )

NameError: name 'Deacto' is not defined

Perform deacto analysis

[ ]:
dbact_output_table = deacto_analysis.perform_deacto_analysis()

airlines - 2023-04-07 13:05:42 : INFO     : Starting Deacto for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO     : Starting Deacto data preparation for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : INFO     : Input data validation finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:42 : INFO     : Input data validation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:42 : DEBUG    : Starting OutlierHandler for 'Experiment 1' ...
airlines - 2023-04-07 13:05:42 : DEBUG    : Function: OutlierHandler.fit_transform, Time: 0.01 sec
airlines - 2023-04-07 13:05:42 : DEBUG    : OutlierHandler for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:43 : DEBUG    : Starting Discretization for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG    : Function: Discretization.fit_transform, Time: 1.98 sec
airlines - 2023-04-07 13:05:45 : DEBUG    : Discretization for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:45 : DEBUG    : Starting CategoricalEncoding for 'Experiment 1' ...
airlines - 2023-04-07 13:05:45 : DEBUG    : Function: CategoricalEncoding.fit_transform, Time: 0.77 sec
airlines - 2023-04-07 13:05:45 : DEBUG    : CategoricalEncoding for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : DEBUG    : Starting FeatureSelection for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG    : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:46 : DEBUG    : FeatureSelection for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO     : Deacto data preparation for 'Experiment 1' finished.
airlines - 2023-04-07 13:05:46 : INFO     : Deacto data prep finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:46 : INFO     : Starting Deacto modeling for 'Experiment 1' ...
airlines - 2023-04-07 13:05:46 : DEBUG    : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:47 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 38571.1
airlines - 2023-04-07 13:05:47 : DEBUG    : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:05:49 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 3, utility : 53525.1
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:05:49 : DEBUG    : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:05:53 : DEBUG    : find_best_split - feature_name : Ease_of_Online_booking, cutoff : 4, utility : 96647.5
airlines - 2023-04-07 13:05:53 : DEBUG    : depth : 2 , direction left , input_path ['right', 'left']
airlines - 2023-04-07 13:05:53 : DEBUG    : depth : 2 , direction right , input_path ['right', 'right']
airlines - 2023-04-07 13:05:53 : DEBUG    : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 7.15 sec
airlines - 2023-04-07 13:05:53 : INFO     : Deacto modeling finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO     : Starting Deacto business actions extraction for 'Experiment 1' ...
airlines - 2023-04-07 13:05:53 : INFO     : Deacto business actions extraction finished successfully for 'Experiment 1'.
airlines - 2023-04-07 13:05:53 : INFO     : Deacto finished successfully for 'Experiment 1'.

Deacto output table exploration

[ ]:
dbact_output_table
pre_condition set change benefit costs utility ROI utility_pct
0 Online_boarding : [1, 2, 3] Ease_of_Online_booking : [5] Online_boarding : 1 -> [5] , Online_boarding :... 36317.4 0 36317.4 inf 79.4
1 Online_boarding : [4] Ease_of_Online_booking : [5] Online_boarding : 4 -> [5] 8448.7 0 8448.7 inf 18.5
2 Online_boarding : [5] , Ease_of_Online_booking... Ease_of_Online_booking : 1 -> [5] , Ease_of_On... 984.2 0 984.2 inf 2.2

Deacto output table specification

Each row describes one action set, its description and evaluation metrics.

Business can decide to act based on one or more action sets in the table.

Columns and their description :

  • Action description :

    • pre_condition - defines the subpopulation of objects on which the action is applied

    • change - actions to take when the action assumes feature changing from source to target value

    • set - actions to take when the action assumes setting feature to specific (target) value regardless of source value

  • Action evaluation :

    • benefit - added value from the action set

    • costs - overall cost of action set

    • utility - lower bound of utility

    • ROI - lower bound of ROI , utility-to-costs ratio (%)

    • utility_pct - ratio of the action set utility to total utility (%)

Experiment 1 output table exploration

First action set achieves ~ 79.4% (utility_pct = 79.4%) of possible benefit and should be examined.

[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': 'Online_boarding : [1, 2, 3]',
 'set': 'Ease_of_Online_booking : [5]',
 'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5]',
 'benefit': 36317.4,
 'costs': 0,
 'utility': 36317.4,
 'ROI': inf,
 'utility_pct': 79.4}

Experiment 1 - Conclusions

  • Online_Boarding and Ease_of_Online_booking satisfactory levels are important to for action discovery

  • The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) and Online_Boarding is proposed to be increased from level 1, 2 and 3 to to highest satisfactory level (5).

  • The actions make sense in absence of costs. The expected benefit is ~ 36,000 which corresponding to number of customers that will with the company in case of 0/1 benefits and zero costs.

  • Assumed that company has an ability to impact directly satisfactory levels above

Experiment 2 - 0/1 busines object value + some action costs

Let consider more realistic scenario when improving satisfactory level for ‘Ease_of_Online_booking’ and ‘Online_Boarding’ costs liniearly inceraasing on satisfactory level.

We put extremely high costs for decreasing satisfactory levels - the change isn’t acceptable.

The costs are for each customer presenting variable costs.

We also increase to 5 value for customer in desired state.

Set input costs

[ ]:
config_input_costs = {}

for f in ['Ease_of_Online_booking','Online_boarding','Inflight_entertainment','Inflight_service']:
    f_domain = sorted(airlines_data[f].unique())
    i_costs  = {str((i,j)):  (j - i) if j >= i else 1000 for i in f_domain for j in f_domain}
    config_input_costs[f] = i_costs

print(f"'Online_boarding' changing from 2 to 5 costs is : {config_input_costs['Online_boarding']['(2, 5)']}")
print(f"'Ease_of_Online_booking' changing from 3 to 2 costs is : {config_input_costs['Online_boarding']['(3, 2)']}")
'Online_boarding' changing from 2 to 5 costs is : 3
'Ease_of_Online_booking' changing from 3 to 2 costs is : 1000

Set user config

[ ]:

config_yaml_str = (f""" DEA_SYS_USER_CONFIG: DEA_SYS_PROJECT_NAME: airlines DEA_EXPERIMENT_ID: Experiment 2 DEA_SYS_OUTPUT_FILE : True DEA_USER_CONFIG: Data: bo_id: id target_feature: satisfaction Dbact: input_bov: 'satisfied': 5 'neutral or dissatisfied': 0 input_nafs: ['Customer_Type', 'Type_of_Travel', 'Class', 'Gender', 'Age'] input_costs: {config_input_costs} """) write_deacto_user_config(DEA_SYS_PROJ_DIR, config_yaml_str)

Run Deacto

[ ]:
from deacto import deacto

deacto_analysis = deacto.Deacto(
    DEA_SYS_PROJ_DIR,
    airlines_data
)

dbact_output_table = deacto_analysis.perform_deacto_analysis()
airlines - 2023-04-07 13:05:54 : INFO     : Deacto configuration finished successfully .
airlines - 2023-04-07 13:05:54 : INFO     : Starting Deacto for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO     : Starting Deacto data preparation for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : INFO     : Input data validation finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:54 : INFO     : Input data validation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG    : Starting OutlierHandler for 'Experiment 2' ...
airlines - 2023-04-07 13:05:54 : DEBUG    : Function: OutlierHandler.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:54 : DEBUG    : OutlierHandler for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:54 : DEBUG    : Starting Discretization for 'Experiment 2' ...
airlines - 2023-04-07 13:05:55 : DEBUG    : Function: Discretization.fit_transform, Time: 1.21 sec
airlines - 2023-04-07 13:05:55 : DEBUG    : Discretization for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:56 : DEBUG    : Starting CategoricalEncoding for 'Experiment 2' ...
airlines - 2023-04-07 13:05:56 : DEBUG    : Function: CategoricalEncoding.fit_transform, Time: 0.8 sec
airlines - 2023-04-07 13:05:56 : DEBUG    : CategoricalEncoding for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : DEBUG    : Starting FeatureSelection for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG    : Function: FeatureSelection.fit_transform, Time: 0.0 sec
airlines - 2023-04-07 13:05:57 : DEBUG    : FeatureSelection for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO     : Deacto data preparation for 'Experiment 2' finished.
airlines - 2023-04-07 13:05:57 : INFO     : Deacto data prep finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:05:57 : INFO     : Starting Deacto modeling for 'Experiment 2' ...
airlines - 2023-04-07 13:05:57 : DEBUG    : depth : 0 , direction All , input_path []
airlines - 2023-04-07 13:05:58 : DEBUG    : find_best_split - feature_name : Online_boarding, cutoff : 4, utility : 25836.3
airlines - 2023-04-07 13:05:58 : DEBUG    : depth : 1 , direction left , input_path ['left']
airlines - 2023-04-07 13:06:03 : DEBUG    : find_best_split - feature_name : Class, cutoff : 1, utility : 40715.3
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 2 , direction left , input_path ['left', 'left']
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 2 , direction right , input_path ['left', 'right']
airlines - 2023-04-07 13:06:03 : DEBUG    : depth : 1 , direction right , input_path ['right']
airlines - 2023-04-07 13:06:21 : DEBUG    : Function: DecisionTreeUtilityClassifier.perform_dtopt_dbact, Time: 24.09 sec
airlines - 2023-04-07 13:06:21 : INFO     : Deacto modeling finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO     : Starting Deacto business actions extraction for 'Experiment 2' ...
airlines - 2023-04-07 13:06:21 : INFO     : Deacto business actions extraction finished successfully for 'Experiment 2'.
airlines - 2023-04-07 13:06:21 : INFO     : Deacto finished successfully for 'Experiment 2'.

Deacto output table exploration

[ ]:
dbact_output_table
pre_condition set change benefit costs utility ROI utility_pct
0 Online_boarding : [1, 2, 3, 4] , Class : ['Eco... Online_boarding : 1 -> [5] , Online_boarding :... 148488.3 107773.0 40715.3 37.8 100.0

Experiment 2 output table exploration

There is only one action set achieves possible benefit and should be examined.

[ ]:
dbact_output_table.loc[0].to_dict()
{'pre_condition': "Online_boarding : [1, 2, 3, 4] , Class : ['Eco', 'Eco Plus']",
 'set': '',
 'change': 'Online_boarding : 1 -> [5] , Online_boarding : 2 -> [5] , Online_boarding : 3 -> [5] , Online_boarding : 4 -> [5]',
 'benefit': 148488.3,
 'costs': 107773.0,
 'utility': 40715.3,
 'ROI': 37.8,
 'utility_pct': 100.0}

Experiment 2 - Conclusions

  • Online_Boarding satisfactory level is most important to for action discovery

  • The change very significant : Ease_of_Online_booking is proposed to be increased to highest satisfactory level (5) for ‘Eco’ and ‘Eco Plus’ Class

  • While it makes sense that customers would be satisfied with “Busienss” class service, to optimize benefits for the company, company it is recommended to prioritize effort on online boarding.