Post-process

Overview:

PostProcess

class epios.PostProcess(demo_data: DataFrame, time_data: DataFrame)[source]

This class is to automatically sample the population at several given time points.

And generate plots and comparison with the true infection level within the population.

How to use:

Define an instance and input the demographical and time data of the population Then use self.predict to generate plots and comparison

To define an instance of PostProcess, you need the following inputs:

Parameters:

demo_datapandas.DataFrame: The geographical data of the population
time_datapandas.DataFrame: The infection data of the population at different time points

class Prediction(demo_data: DataFrame, time_data: DataFrame)[source]

This sub-class is to automatically sample the population at several given time points.

This sub-class is automatically defined when an instance of PostProcess is defined.

To use this class, call methods defined under this class to sample and generate plots.

Parameters:

demo_datapandas.DataFrame: The geographical data of the population
time_datapandas.DataFrame: The infection data of the population at different time points

Age(sample_size, time_sample, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', seed=None, saving_path_compare=None, scale_method='proportional')[source]

This class is to sample and plot figures using both age and region stratification.

Parameters:

sample_sizeint

The size of sample

time_samplelist

A list of time points to sample the population

comparisonbool

Turn on or off the comparison between the sampled result and the true result

Default = True

sample_strategystr

A specific string indicating whether want to change sampled people between each sampling

Strings can be identified: [‘Random’, ‘Same’]

Default = ‘Random’

gen_plotbool

Whether or not to generate plots

Default = False

saving_path_samplingstr

The path to save figure showing predicted infection level

Default = None

saving_path_comparestr

The path to save figure showing comparison between predicted and true infection level

Default = None

num_age_groupint

Indicating how many age groups are there.

The last group includes age >= some threshold

Default = 17

age_group_widthint

Indicating the width of each age group(except for the last group)

Default = 5

scale_methodstr

Specific string telling how to compare the sampled data with the true population

Default = ‘proportional’

data_store_pathstr

The path to store data generated during sampling

Default = ./input/

seedint or None

The seed for random numbers

Default = None

AgeRegion(sample_size, time_sample, non_responder=False, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', sampling_percentage=0.1, proportion=0.01, threshold=None, seed=None, saving_path_compare=None, scale_method='proportional')[source]

This class is to sample and plot figures using both age and region stratification.

Parameters:

sample_sizeint

The size of sample

time_samplelist

A list of time points to sample the population

non_responderbool

Turn on or off the non-responder function

Default = False

non_resp_ratefloat between 0 and 1

The probability that a person does not respond

Default = None

comparisonbool

Turn on or off the comparison between the sampled result and the true result

Default = True

sample_strategystr

A specific string indicating whether want to change sampled people between each sampling

Strings can be identified: [‘Random’, ‘Same’]

Default = ‘Random’

gen_plotbool

Whether or not to generate plots

Default = False

saving_path_samplingstr

The path to save figure showing predicted infection level

Default = None

saving_path_comparestr

The path to save figure showing comparison between predicted and true infection level

Default = None

num_age_groupint

Indicating how many age groups are there.

The last group includes age >= some threshold

Default = 17

age_group_widthint

Indicating the width of each age group(except for the last group)

Default = 5

scale_methodstr

Specific string telling how to compare the sampled data with the true population

Default = ‘proportional’

sampling_percentagefloat, between 0 and 1

The proportion of additional samples taken from a specific (age-)regional group

Default = 0.1 (Only for non-responders)

proportionfloat, between 0 and 1

The proportion of total groups to be sampled additionally

Default = 0.01 (Only for non-responders)

thresholdNoneType or Int

The lowest number of groups to be sampled additionally

Default = None (Only for non-responders)

data_store_pathstr

The path to store data generated during sampling

Default = ./input/

seedint or None

The seed for random numbers

Default = None

Base(sample_size, time_sample, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', seed=None, saving_path_compare=None, scale_method='proportional')[source]

This class is to sample and plot figures using both age and region stratification.

Parameters:

sample_sizeint

The size of sample

time_samplelist

A list of time points to sample the population

comparisonbool

Turn on or off the comparison between the sampled result and the true result

Default = True

sample_strategystr

A specific string indicating whether want to change sampled people between each sampling

Strings can be identified: [‘Random’, ‘Same’]

Default = ‘Random’

gen_plotbool

Whether or not to generate plots

Default = False

saving_path_samplingstr

The path to save figure showing predicted infection level

Default = None

saving_path_comparestr

The path to save figure showing comparison between predicted and true infection level

Default = None

scale_methodstr

Specific string telling how to compare the sampled data with the true population

Default = ‘proportional’

data_store_pathstr

The path to store data generated during sampling

Default = ./input/

seedint or None

The seed for random numbers

Default = None

Region(sample_size, time_sample, non_responder=False, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, data_store_path='./input/', sampling_percentage=0.1, proportion=0.01, threshold=None, seed=None, saving_path_compare=None, scale_method='proportional')[source]

This class is to sample and plot figures using both age and region stratification.

Parameters:

sample_sizeint

The size of sample

time_samplelist

A list of time points to sample the population

non_responderbool

Turn on or off the non-responder function

Default = False

non_resp_ratefloat between 0 and 1

The probability that a person does not respond

Default = None

comparisonbool

Turn on or off the comparison between the sampled result and the true result

Default = True

sample_strategystr

A specific string indicating whether want to change sampled people between each sampling

Strings can be identified: [‘Random’, ‘Same’]

Default = ‘Random’

gen_plotbool

Whether or not to generate plots

Default = False

saving_path_samplingstr

The path to save figure showing predicted infection level

Default = None

saving_path_comparestr

The path to save figure showing comparison between predicted and true infection level

Default = None

scale_methodstr

Specific string telling how to compare the sampled data with the true population

Default = ‘proportional’

sampling_percentagefloat, between 0 and 1

The proportion of additional samples taken from a specific (age-)regional group

Default = 0.1 (Only for non-responders)

proportionfloat, between 0 and 1

The proportion of total groups to be sampled additionally

Default = 0.01 (Only for non-responders)

thresholdNoneType or Int

The lowest number of groups to be sampled additionally

Default = None (Only for non-responders)

data_store_pathstr

The path to store data generated during sampling

Default = ./input/

seedint or None

The seed for random numbers

Default = None

best_method(methods, sample_size, hyperparameter_autotune=False, non_responder=False, non_resp_rate=None, sampling_interval=7, parallel_computation=True, metric='mean', iteration=100, data_store_path='./input/', **kwargs)[source]

Print the best method among different methods provided.

When hyper-parameter autotune is on, will firstly print the best parameter combination and its performance of each method, then print the best method across all methods.

The order of best parameter set printed follows the following ordering:

( ‘num_age_group’, ‘age_group_width’, ‘sampling_percentage’, ‘proportion’, ‘threshold’ )

Parameter will be omitted if that parameter is not applicable for the method.

Features:

When a range of parameters provided, can automatically tune the hyperparameters

Can set to consider non-ressponders

Will print any unrecognised inputs or methods

Parameters:

methodslist

A list of strings indicating the methods to compare with each other Acceptible methods:

Use ‘Same’ strategy:
‘Age-Same’ ‘Region-Same’ ‘AgeRegion-Same’ ‘Base-Same’

Use ‘Random’ strategy:
‘Age-Random’ ‘Region-Random’ ‘AgeRegion-Random’ ‘Base-Random’

Note: When you input the method names without sample strategy, ‘Random’ will be the default

sample_sizeint

The size of sample

hyperparameter_autotunebool

Whether or not to turn on the hyperparameter automatic tuning

For extra input, see documentation for parameter ‘kwargs’ below

non_responderbool

Whether or not to consider non-responders

sampling_intervalint

The number of days between each sampling time points

metricstr

The metric used to transform difference between the sampled result and true infection into a float to measure the performance. Acceptible metric:

‘mean’:
Use the mean of absolute difference between true and predicted infection. We ignore all nan values

‘max’:
Use the max of absolute difference between true and predicted infection.

iterationint

The number of iterations to run and average the value of prediction to get a robust result

parallel_computationbool

Whether or not to use multiprocessing to speed up this repeated process

Default = True

Note: You cannot directly call this method when this is turned on, see example in documentation

data_store_pathstr

The path to store files generated during sampling when parallel computation is disabled

Default = ‘./input/’

This is used only when parallel computation is disabled

kwargsdict

A dictionary of parameters passed to process part The following parameters can be passed:

num_age_groupint
Indicating how many age groups are there.

Default = 17

The last group includes age >= some threshold

This is used when autotuning is turned off

age_group_widthint
Indicating the width of each age group(except for the last group)

Default = 5

This is used when autotuning is turned off

sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific (age-)regional group

Default = 0.1 (Only for non-responders)

This is used when autotuning is turned off

proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally

Default = 0.01 (Only for non-responders)

This is used when autotuning is turned off

thresholdNoneType or Int
The lowest number of groups to be sampled additionally

Default = None (Only for non-responders)

This is used when autotuning is turned off

num_age_group_rangelist
All possible number of age groups that you want to try/iterate over

Default = [10, 13, 15, 17, 20]

The last group includes age >= some threshold

This is used when autotuning is turned on

age_group_width_rangelist
All possible age group width (except for the last group) that you want to try/iterate over

Default = [5, 10]

This is used when autotuning is turned on

sampling_percentage_rangelist
All possible proportion of additional samples taken from a specific (age-)regional group that you want to try/iterate over

Default = [0.1, 0.2, 0.3] (Only for non-responders)

This is used when autotuning is turned on

proportion_rangelist
All possible proportion of total groups to be sampled additionally that you want to try/iterate over

Default = [0.01, 0.05, 0.1] (Only for non-responders)

This is used when autotuning is turned on

threshold_rangelist
All possible lowest number of groups to be sampled additionally that you want to try/iterate over

Default = [10, 20, 30] (Only for non-responders)

This is used when autotuning is turned on

Here is an example of using PostProcess

import epios
import pandas as pd

# Define the simulation output data
demo_data = pd.read_csv('demographics.csv')
time_data = pd.read_csv('inf_status_history.csv')

# Define the class instance
postprocess = epios.PostProcess(time_data=time_data, demo_data=demo_data)

# Do prediction and comparison based age-region stratification
result, diff = postprocess.predict.AgeRegion(sample_size=3,
                                             time_sample=[0, 1, 2, 3],
                                             comparison=True,
                                             non_responder=False,
                                             gen_plot=True,
                                             sample_strategy='Random')

# Define the input keywards for finding the best method
best_method_kwargs = {
    'age_group_width_range': [14, 17, 20]
}

# Suppose we want to compare among methods Age-Random, Base-Same,
# Base-Random, Region-Random and AgeRegion-Random

# And suppose we want to turn on the parallel computation to speed up
if __name__ == '__main__':  # This line can be omitted when not using parallel computation
    postprocess.best_method(
        methods=[
            'Age',
            'Base-Same',
            'Base-Random',
            'Region-Random',
            'AgeRegion-Random'
        ],
        sample_size=3,
        hyperparameter_autotune=True,
        non_responder=False,
        sampling_interval=7,
        iteration=1,
        # When considering non-responders, input the following line
        # non_resp_rate=0.1,
        metric='mean',
        parallel_computation=True,
        **best_method_kwargs
    )
# Then the output will be printed