Post-process
Overview:
- class epios.PostProcess(demo_data: DataFrame, time_data: DataFrame)[source]
This class is to automatically sample the population at several given time points.
And generate plots and comparison with the true infection level within the population.
How to use:
Define an instance and input the demographical and time data of the population Then use self.predict to generate plots and comparison
To define an instance of PostProcess, you need the following inputs:
Parameters:
- demo_datapandas.DataFrame
The geographical data of the population
- time_datapandas.DataFrame
The infection data of the population at different time points
- class Prediction(demo_data: DataFrame, time_data: DataFrame)[source]
This sub-class is to automatically sample the population at several given time points.
This sub-class is automatically defined when an instance of PostProcess is defined.
To use this class, call methods defined under this class to sample and generate plots.
Parameters:
- demo_datapandas.DataFrame
The geographical data of the population
- time_datapandas.DataFrame
The infection data of the population at different time points
- Age(sample_size, time_sample, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', seed=None, saving_path_compare=None, scale_method='proportional')[source]
This class is to sample and plot figures using both age and region stratification.
Parameters:
- sample_sizeint
The size of sample
- time_samplelist
A list of time points to sample the population
- comparisonbool
Turn on or off the comparison between the sampled result and the true result
Default = True
- sample_strategystr
A specific string indicating whether want to change sampled people between each sampling
Strings can be identified: [‘Random’, ‘Same’]
Default = ‘Random’
- gen_plotbool
Whether or not to generate plots
Default = False
- saving_path_samplingstr
The path to save figure showing predicted infection level
Default = None
- saving_path_comparestr
The path to save figure showing comparison between predicted and true infection level
Default = None
- num_age_groupint
Indicating how many age groups are there.
The last group includes age >= some threshold
Default = 17
- age_group_widthint
Indicating the width of each age group(except for the last group)
Default = 5
- scale_methodstr
Specific string telling how to compare the sampled data with the true population
Default = ‘proportional’
- data_store_pathstr
The path to store data generated during sampling
Default = ./input/
- seedint or None
The seed for random numbers
Default = None
- AgeRegion(sample_size, time_sample, non_responder=False, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', sampling_percentage=0.1, proportion=0.01, threshold=None, seed=None, saving_path_compare=None, scale_method='proportional')[source]
This class is to sample and plot figures using both age and region stratification.
Parameters:
- sample_sizeint
The size of sample
- time_samplelist
A list of time points to sample the population
- non_responderbool
Turn on or off the non-responder function
Default = False
- non_resp_ratefloat between 0 and 1
The probability that a person does not respond
Default = None
- comparisonbool
Turn on or off the comparison between the sampled result and the true result
Default = True
- sample_strategystr
A specific string indicating whether want to change sampled people between each sampling
Strings can be identified: [‘Random’, ‘Same’]
Default = ‘Random’
- gen_plotbool
Whether or not to generate plots
Default = False
- saving_path_samplingstr
The path to save figure showing predicted infection level
Default = None
- saving_path_comparestr
The path to save figure showing comparison between predicted and true infection level
Default = None
- num_age_groupint
Indicating how many age groups are there.
The last group includes age >= some threshold
Default = 17
- age_group_widthint
Indicating the width of each age group(except for the last group)
Default = 5
- scale_methodstr
Specific string telling how to compare the sampled data with the true population
Default = ‘proportional’
- sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific (age-)regional group
Default = 0.1 (Only for non-responders)
- proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally
Default = 0.01 (Only for non-responders)
- thresholdNoneType or Int
The lowest number of groups to be sampled additionally
Default = None (Only for non-responders)
- data_store_pathstr
The path to store data generated during sampling
Default = ./input/
- seedint or None
The seed for random numbers
Default = None
- Base(sample_size, time_sample, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, num_age_group=17, age_group_width=5, data_store_path='./input/', seed=None, saving_path_compare=None, scale_method='proportional')[source]
This class is to sample and plot figures using both age and region stratification.
Parameters:
- sample_sizeint
The size of sample
- time_samplelist
A list of time points to sample the population
- comparisonbool
Turn on or off the comparison between the sampled result and the true result
Default = True
- sample_strategystr
A specific string indicating whether want to change sampled people between each sampling
Strings can be identified: [‘Random’, ‘Same’]
Default = ‘Random’
- gen_plotbool
Whether or not to generate plots
Default = False
- saving_path_samplingstr
The path to save figure showing predicted infection level
Default = None
- saving_path_comparestr
The path to save figure showing comparison between predicted and true infection level
Default = None
- scale_methodstr
Specific string telling how to compare the sampled data with the true population
Default = ‘proportional’
- data_store_pathstr
The path to store data generated during sampling
Default = ./input/
- seedint or None
The seed for random numbers
Default = None
- Region(sample_size, time_sample, non_responder=False, comparison=True, non_resp_rate=None, sample_strategy='Random', gen_plot: bool = False, saving_path_sampling=None, data_store_path='./input/', sampling_percentage=0.1, proportion=0.01, threshold=None, seed=None, saving_path_compare=None, scale_method='proportional')[source]
This class is to sample and plot figures using both age and region stratification.
Parameters:
- sample_sizeint
The size of sample
- time_samplelist
A list of time points to sample the population
- non_responderbool
Turn on or off the non-responder function
Default = False
- non_resp_ratefloat between 0 and 1
The probability that a person does not respond
Default = None
- comparisonbool
Turn on or off the comparison between the sampled result and the true result
Default = True
- sample_strategystr
A specific string indicating whether want to change sampled people between each sampling
Strings can be identified: [‘Random’, ‘Same’]
Default = ‘Random’
- gen_plotbool
Whether or not to generate plots
Default = False
- saving_path_samplingstr
The path to save figure showing predicted infection level
Default = None
- saving_path_comparestr
The path to save figure showing comparison between predicted and true infection level
Default = None
- scale_methodstr
Specific string telling how to compare the sampled data with the true population
Default = ‘proportional’
- sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific (age-)regional group
Default = 0.1 (Only for non-responders)
- proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally
Default = 0.01 (Only for non-responders)
- thresholdNoneType or Int
The lowest number of groups to be sampled additionally
Default = None (Only for non-responders)
- data_store_pathstr
The path to store data generated during sampling
Default = ./input/
- seedint or None
The seed for random numbers
Default = None
- best_method(methods, sample_size, hyperparameter_autotune=False, non_responder=False, non_resp_rate=None, sampling_interval=7, parallel_computation=True, metric='mean', iteration=100, data_store_path='./input/', **kwargs)[source]
Print the best method among different methods provided.
When hyper-parameter autotune is on, will firstly print the best parameter combination and its performance of each method, then print the best method across all methods.
- The order of best parameter set printed follows the following ordering:
( ‘num_age_group’, ‘age_group_width’, ‘sampling_percentage’, ‘proportion’, ‘threshold’ )
Parameter will be omitted if that parameter is not applicable for the method.
Features:
When a range of parameters provided, can automatically tune the hyperparameters
Can set to consider non-ressponders
Will print any unrecognised inputs or methods
Parameters:
- methodslist
A list of strings indicating the methods to compare with each other Acceptible methods:
- Use ‘Same’ strategy:
‘Age-Same’ ‘Region-Same’ ‘AgeRegion-Same’ ‘Base-Same’
- Use ‘Random’ strategy:
‘Age-Random’ ‘Region-Random’ ‘AgeRegion-Random’ ‘Base-Random’
Note: When you input the method names without sample strategy, ‘Random’ will be the default
- sample_sizeint
The size of sample
- hyperparameter_autotunebool
Whether or not to turn on the hyperparameter automatic tuning
For extra input, see documentation for parameter ‘kwargs’ below
- non_responderbool
Whether or not to consider non-responders
- sampling_intervalint
The number of days between each sampling time points
- metricstr
The metric used to transform difference between the sampled result and true infection into a float to measure the performance. Acceptible metric:
- ‘mean’:
Use the mean of absolute difference between true and predicted infection. We ignore all nan values
- ‘max’:
Use the max of absolute difference between true and predicted infection.
- iterationint
The number of iterations to run and average the value of prediction to get a robust result
- parallel_computationbool
Whether or not to use multiprocessing to speed up this repeated process
Default = True
Note: You cannot directly call this method when this is turned on, see example in documentation
- data_store_pathstr
The path to store files generated during sampling when parallel computation is disabled
Default = ‘./input/’
This is used only when parallel computation is disabled
- kwargsdict
A dictionary of parameters passed to process part The following parameters can be passed:
- num_age_groupint
Indicating how many age groups are there.
Default = 17
The last group includes age >= some threshold
This is used when autotuning is turned off
- age_group_widthint
Indicating the width of each age group(except for the last group)
Default = 5
This is used when autotuning is turned off
- sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific (age-)regional group
Default = 0.1 (Only for non-responders)
This is used when autotuning is turned off
- proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally
Default = 0.01 (Only for non-responders)
This is used when autotuning is turned off
- thresholdNoneType or Int
The lowest number of groups to be sampled additionally
Default = None (Only for non-responders)
This is used when autotuning is turned off
- num_age_group_rangelist
All possible number of age groups that you want to try/iterate over
Default = [10, 13, 15, 17, 20]
The last group includes age >= some threshold
This is used when autotuning is turned on
- age_group_width_rangelist
All possible age group width (except for the last group) that you want to try/iterate over
Default = [5, 10]
This is used when autotuning is turned on
- sampling_percentage_rangelist
All possible proportion of additional samples taken from a specific (age-)regional group that you want to try/iterate over
Default = [0.1, 0.2, 0.3] (Only for non-responders)
This is used when autotuning is turned on
- proportion_rangelist
All possible proportion of total groups to be sampled additionally that you want to try/iterate over
Default = [0.01, 0.05, 0.1] (Only for non-responders)
This is used when autotuning is turned on
- threshold_rangelist
All possible lowest number of groups to be sampled additionally that you want to try/iterate over
Default = [10, 20, 30] (Only for non-responders)
This is used when autotuning is turned on
Here is an example of using PostProcess
import epios
import pandas as pd
# Define the simulation output data
demo_data = pd.read_csv('demographics.csv')
time_data = pd.read_csv('inf_status_history.csv')
# Define the class instance
postprocess = epios.PostProcess(time_data=time_data, demo_data=demo_data)
# Do prediction and comparison based age-region stratification
result, diff = postprocess.predict.AgeRegion(sample_size=3,
time_sample=[0, 1, 2, 3],
comparison=True,
non_responder=False,
gen_plot=True,
sample_strategy='Random')
# Define the input keywards for finding the best method
best_method_kwargs = {
'age_group_width_range': [14, 17, 20]
}
# Suppose we want to compare among methods Age-Random, Base-Same,
# Base-Random, Region-Random and AgeRegion-Random
# And suppose we want to turn on the parallel computation to speed up
if __name__ == '__main__': # This line can be omitted when not using parallel computation
postprocess.best_method(
methods=[
'Age',
'Base-Same',
'Base-Random',
'Region-Random',
'AgeRegion-Random'
],
sample_size=3,
hyperparameter_autotune=True,
non_responder=False,
sampling_interval=7,
iteration=1,
# When considering non-responders, input the following line
# non_resp_rate=0.1,
metric='mean',
parallel_computation=True,
**best_method_kwargs
)
# Then the output will be printed