Sampler
Overview:
- class epios.Sampler(data=None, data_store_path='./input/', pre_process=True, num_age_group=None, age_group_width=None, mode='Base')[source]
Bases:
DataProcessThe base sampling class.
This class will perform a totally random sampling for a single time.
Parameters:
If you want to input new data, you can input that into data argument and set the pre_process to True.
If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.
- num_age_groupint
This will be used when age stratification is enabled indicating how many age groups are there.
The last group includes age >= some threshold
- age_group_widthint
This will be used when age stratification is enabled, indicating the width of each age group (except for the last group)
- modestr
This indicates the specific mode to process the data. This should be the name of the modes that can be identified.
If you want this class sample as originally designed, do not change this value
- person_allowed(sample: list, choice: str, threshold: int = 3)[source]
Function to see if the sampled person should be included in the generic sample
Parameters:
- samplelist
List of people who have already been chosen
- choicestr
string id of the person being sampled
- thresholdint
The cap on the number of people sampled per household
- class epios.SamplerAge(data=None, data_store_path='./input/', pre_process=True, num_age_group=17, age_group_width=5, mode='Age')[source]
Bases:
SamplerThe sampling class with age stratification.
Parameters:
If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.
- num_age_groupint
Indicating how many age groups are there.
The last group includes age >= some threshold
- age_group_widthint
Indicating the width of each age group (except for the last group)
- modestr
This indicates the specific mode to process the data. This should be the name of the modes that can be identified.
If you want this class sample as originally designed, do not change this value
- get_age_dist()[source]
Read the age distribution from pop_dist.json processed from DataProcess class
Output:
- configlist
A list of floats, with sum 1, length should be the number of age groups
- multinomial_draw(n: int, prob: list)[source]
Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by age.
Parameters:
- nint
The sample size
- problist
List of floats, sum to 1. Length should be number of age groups
Output:
- reslist
A list of integers indicating the number of samples from each age group
- class epios.SamplerRegion(data=None, data_store_path='./input/', pre_process=True, mode='Region')[source]
Bases:
SamplerThe sampling class with region stratification.
Parameters:
If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.
- modestr
This indicates the specific mode to process the data. This should be the name of the modes that can be identified.
If you want this class sample as originally designed, do not change this value
- additional_nonresponder(non_resp_id: list, sampling_percentage=0.1, proportion=0.01, threshold=None)[source]
Generate the additional samples according to the non-responder IDs
Parameters:
- non_resp_idlist
A list containing the non-responder IDs
- sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific regional group
- proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally
- thresholdNoneType or Int
The lowest number of regional groups to be sampled additionally
- Note: proportion and threshold both determine the number of groups to be sampled additionally,
but both are depending on how many groups can be sampled additionally
Output:
- additional_samplelist with length num_region_group
A list containing how many additional samples we would like to draw from each region group
- get_region_dist()[source]
Extract the geo-distribution from the microcells.csv generated by DataProcess class
Output:
- distlist
A list of floats, with sum 1, length should be the number of cells
- multinomial_draw(n: int, prob: list)[source]
Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by region. The second output is for convenience of the following sampling function.
Parameters:
- nint
The sample size
- problist
List of floats, sum to 1. Length should be number of region groups
Output:
- reslist
A list of integers indicating the number of samples from each region group
- res_cap_blocklist
A list of caps for each region group
- sample(sample_size: int, additional_sample: list | None = None, household_criterion=False, household_threshold: int = 3)[source]
Given a sample size, and the additional sample, should return a list of people’s IDs drawn from the population
Parameters:
- sample_sizeint
The size of sample
- additional_samplelist
List of integers indicating the number of additional samples drawn from each region group
- household_criterionbool
Turn on or off the household criterion
- household_thresholdint
The maximum number of people sampled from one household
Output:
- reslist
A list of ID of the sampled people
- class epios.SamplerAgeRegion(data=None, data_store_path='./input/', pre_process=True, num_age_group=17, age_group_width=5, mode='AgeRegion')[source]
Bases:
SamplerThe sampling class with age and region stratification.
Parameters:
If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.
- num_age_groupint
Indicating how many age groups are there.
The last group includes age >= some threshold
- age_group_widthint
Indicating the width of each age group (except for the last group)
- modestr
This indicates the specific mode to process the data. This should be the name of the modes that can be identified.
If you want this class sample as originally designed, do not change this value
- additional_nonresponder(non_resp_id: list, sampling_percentage=0.1, proportion=0.01, threshold=None)[source]
Generate the additional samples according to the non-responder IDs
Parameters:
- non_resp_idlist
A list containing the non-responder IDs
- sampling_percentagefloat, between 0 and 1
The proportion of additional samples taken from a specific age-regional group
- proportionfloat, between 0 and 1
The proportion of total groups to be sampled additionally
- thresholdNoneType or Int
The lowest number of age-regional groups to be sampled additionally
- Note: proportion and threshold both determine the number of groups to be sampled additionally,
but both are depending on how many groups can be sampled additionally
Output:
- additional_samplelist of 2D, with dimension (num_region_group, num_age_group)
A list containing how many additional samples we would like to draw from each age-region group
- get_age_dist()[source]
Read the age distribution from pop_dist.json processed from DataProcess class
Output:
- configlist
A list of floats, with sum 1, length should be the number of age groups
- get_region_dist()[source]
Extract the geo-distribution from the microcells.csv generated by DataProcess class
Output:
- distlist
A list of floats, with sum 1, length should be the number of cells
- multinomial_draw(n: int, prob: list)[source]
Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by age and region. The second output is for convenience of the following sampling function.
Parameters:
- nint
The sample size
- problist
List of floats, sum to 1. Length should be number of age groups times number of region groups
Output:
- reslist
A list of integers indicating the number of samples from each age-region group
- res_cap_blocklist
A list of caps for each age-region group
- sample(sample_size: int, additional_sample: list | None = None, household_criterion=False, household_threshold: int = 3)[source]
Given a sample size, and the additional sample, should return a list of people’s IDs drawn from the population
Parameters:
- sample_sizeint
The size of sample
- additional_samplelist
List of integers indicating the number of additional samples drawn from each age-region group
- household_criterionbool
Turn on or off the household criterion
- household_thresholdint
The maximum number of people sampled from one household
Output:
- reslist
A list of ID of the sampled people