Decotools API¶
File I/O¶
There are two functions used to retrieve DECO data: get_iOS_files
and get_android_files
.
-
decotools.fileio_iOS.
get_iOS_files
(start_date=None, end_date=None, data_dir='/net/deco/iOSdata', include_events=True, include_min_bias=False, phone_model=None, device_id=None, return_metadata=False, n_jobs=1, verbose=0)[source]¶ Function to retrieve deco iOS image files
Parameters: start_date : str, optional
Starting date for the iOS files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default starting date is ‘2016.01.01’.
end_date : str, optional
Ending date for the iOS files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default is the current date.
data_dir : str, optional
Base directory to search for iOS image files.
include_events : bool, optional
Option to include images files flagged as events. Default is True.
include_min_bias : bool, optional
Option to include minimum bias image files. Default is False.
phone_model : str or array-like, optional
Option to specify which phone models you would like to look at. Can be either a string, e.g. ‘iPhone 7’, or a list of models, e.g. [‘iPhone 5’, ‘iPhone 5s’]. Default is to include all phone models.
device_id : str or array-like, optional
Option to specify which devices you want to look at. Can either be a string, e.g. ‘EFD5764E-7209-4579-B0A8-EAF80C950147’, or a list of device IDs, e.g. [‘EFD5764E-7209-4579-B0A8-EAF80C950147’, ‘F216114B-8710-4790-A05D-D645C9C79C27’]. Default is to include all device IDs.
return_metadata : boolean, optional
Return a DataFrame with metadata information for each image file (default is False).
n_jobs : int, optional
The number of jobs to run in parallel (default is 1).
verbose : int {0, 1, or 2}
Option to have verbose output when getting files. Where 0 is least verbose, while 2 is the most verbose.
Returns: numpy.ndarray
Numpy array containing files that match specified criteria
-
decotools.fileio_android.
get_android_files
(start_date=None, end_date=None, data_dir='/net/deco/deco_data', db_file='/net/deco/db_hourly_safe.csv', include_events=True, include_min_bias=False, device_id=None, return_metadata=False, verbose=0)[source]¶ Function to retrieve deco android image files
Parameters: start_date : str, optional
Starting date for the files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default starting date is ‘2010.01.01’.
end_date : str, optional
Ending date for the files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default is the current date.
data_dir : str, optional
Base directory to retrieve android image files.
db_file : str, optional
File path to android database.
include_events : bool, optional
Option to include images files flagged as events. Default is True.
include_min_bias : bool, optional
Option to include minimum bias image files. Default is False.
device_id : str or array-like, optional
Option to specify which devices you want to look at. Can either be a string, e.g. ‘DECO-00000000-450a-7561-433f-0516209b4922’, or a list of device IDs, e.g. [‘DECO-00000000-450a-7561-433f-0516209b4922’, ‘DECO-ffffffff-bd6f-e5fb-842b-56b10033c587’]. Default is to include all device IDs.
return_metadata : boolean, optional
Return a DataFrame with metadata information for each image file (default is False).
verbose : int {0, 1, or 2}
Option to have verbose output when getting files. Where 0 is least verbose, while 2 is the most verbose.
Returns: numpy.ndarray
Numpy array containing files that match specified criteria
Blob Extraction¶
Finding interesting structure in images is done using the extract_blobs
function.
-
decotools.blob_extraction.
extract_blobs
(image_file, threshold=20.0, rgb_sum=False, min_area=10.0, max_area=1000.0, max_dist=5.0, group_max_area=None, size=None)[source]¶ Function to perform blob detection on an input image
Blobs are found using the marching squares algorithm implemented in scikit-image.
Parameters: image_file : str
Path to image file.
threshold : float, optional
Threshold for blob detection. Only pixels with an intensity above this threshold will be used in blob detection (default: 20).
rgb_sum : bool, optional
Whether to use a simple RGB sum to convert to image to grayscale, or to use a weighted RGB sum (default: False).
min_area : float, optional
Minimum area for a blob to be kept. This helps get rid of noise in an image (default: 10).
max_area : float, optional
Maximum area for a blob to be kept. This helps get rid of pathological events in an image (default: 1000).
max_dist : float, optional
Distance scale for grouping close by blobs. If two blobs are separated by less than max_dist, they are grouped together as a single blob (defualt: 5).
group_max_area : float, optional
Maximum area for a blob group to be kept. This helps get rid of pathological events in an image (default: None).
size : {None, int, array-like of shape=(2,)}, optional
Size of zoomed image of extracted blobs. If an integer is provided, the zoomed image will be a square box of size 2*size in each dimension. If an array-like object (of shape=(2,)) is provided, then the zoomed image will be of size 2*size[0] by 2*size[1]. Otherwise, the default behavior is to return a square image of size twice the equivalent diameter of the blob.
Returns: pandas.DataFrame
A DataFrame containing information about the found blobs is returned. Each row in the DataFrame corresponds to a blob group, while each column corresponds to a pertinent quanitity (area, eccentricity, zoomed image array, etc.).
Metrics¶
Functions to calculate intensity metrics and histogram images are in the metrics
module.
-
decotools.metrics.
get_rgb_hists
(files, cumulative=False, n_jobs=1)[source]¶ Calculates histograms of the pixel RGB sum distributions
Parameters: files : str, sequence
Image file path (or sequence of file paths) to be analyzed.
cumulative : bool, optional
Option to calculate cumulative histograms. Histogrammed quantities will be N pixels > threshold.
n_jobs : int, optional
The number of jobs to run in parallel (default is 1).
Returns: hists : pandas.DataFrame
Dataframe containing histograms of pixel RGB sums. Each row of the Dataframe corresponds to a single image and each column corresponds to an RGB sum value.
-
decotools.metrics.
get_intensity_metrics
(files, rgb_sum=False, n_jobs=1)[source]¶ Calculates various metrics related to the image intensity
Parameters: files : str, sequence
Image file path (or sequence of file paths) to be analyzed.
rgb_sum : bool, optional
Option to use simple RGB sum for grayscale conversion (default is to use weighted RGB sum).
n_jobs : int, optional
The number of jobs to run in parallel (default is 1).
Returns: image_intensities : pandas.DataFrame
DataFrame with intensity metrics
Event classification¶
CNN event classification is done using the CNN
object.
-
class
decotools.convnet.
CNN
(weights_file=None, model_file=None, custom_model=None, training=False, n_classes=4)[source]¶ CNN class
Parameters: weights_file : str, optional
Path and file name of an hdf5 file containing the trained model weights to be used by the CNN (default is None).
model_file : str, optional
Path and file name of an hdf5 file containing a trained model. Typically, this should only be used when continuing an existing training session (default is None).
custom_model : keras model, optional
User-defined, compiled keras model to be used in place of the default (default is None).
training : bool, optional
If True, initializes the model structure used for training. If False, initializes the model structure used for predictions (default is False).
n_classes : int, optional
Number of classes to be used by the CNN (default is 4).
-
evaluate
(images, labels, batch_size=32, verbose=0)[source]¶ Evaluate accuracy and loss of model predictions
Parameters: images : numpy.ndarray
Array of grayscale, normalized images to be used for evaluation. Input shape = (n_image,n_row,n_cols,1).
labels : numpy.ndarray
Array of labels to be used for evaluation, shape=(n_images,).
batch_size : int, optional
Batch size to use for predictions (default is 32).
verbose : int, optional
Verbosity mode to use, 0 or 1 (default is 0).
Returns: list
list containing [test_loss, test_accuracy]
-
fit
(train_images, train_labels, test_images=None, test_labels=None, cv=None, batch_size=64, seed=None, epochs=800, initial_epoch=0, smooth_factor=0.004, horizontal_flip=True, vertical_flip=True, width_shift_range=0.08, height_shift_range=0.08, rotation_range=180.0, zoom_range=(0.9, 1.1), fill_mode='constant', cval=0, shuffle=True, save_model=None, save_weights=None, save_history=None, check_point=None, check_point_weights_only=True, verbose=False)[source]¶ Fit CNN
Parameters: train_images : numpy.ndarray
Array of grayscale, normalized images to be used for training the CNN. Input shape = (n_image,n_row,n_cols,1).
train_labels : numpy.ndarray
Array of training labels, shape=(n_images,).
test_images : numpy.ndarray, optional
Array of grayscale, normalized images to be used for testing the CNN. Input shape = (n_image,n_row,n_cols,1).
test_labels : numpy.ndarray, optional
Array of testing labels, shape=(n_images,).
cv : int, scikit-learn cross validator, None, optional
Option for cross-validation fitting. If
cv
is an integersklearn.model_selection.StratifiedKFold
will be used withcv
number of folds. Other cross validators fromsklearn.model_selection
can be passed tocv
as well (default is None).batch_size : int, optional
Number of samples per gradient update (default is 64).
seed : int, optional
Random seed to be used for reproducibility. (default is None).
epochs : int, optional
Number of epochs to train the model. Note that in conjunction with initial_epoch, the parameter epochs is to be understood as “final epoch”. (default is 800).
initial_epoch : int, optional
Epoch at which to start training. Useful for resuming a previous training run (default is 0).
smooth_factor : float in range (0, 1), optional
Level of smoothing to apply to one-hot label vector. Ex. smooth_factor of 0.004 applied to [0, 1, 0, 0], results in [0.001, 0.997, 0.001, 0.001] (default is 0.004).
horizontal_flip : bool, optional
Randomly flip inputs horizontally (default is True).
vertical_flip : bool, optional
Randomly flip inputs vertically (default is True).
width_shift_range : float, optional
Range for random horizontal shifts (default is 0.08).
height_shift_range : float, optional
Range for random vertical shifts (default is 0.08).
rotation_range : int, optional
Degree range for random rotations (default is 180).
zoom_range : float or (lower, upper), optional
Range for random zoom. If a float,
(lower, upper) = (1-zoom_range, 1+zoom_range)
(default is(0.9, 1.1)
).fill_mode : {“constant”, “nearest”, “reflect” or “wrap”}, optional
Points outside the boundaries of the input are filled according to the given mode (default is “constant”).
cval : float or int, optional
Value used for interpolated pixels when
fill_mode="constant"
(default is 0).shuffle : bool, optional
Whether to shuffle the order of the batches at the beginning of each epoch (default is True).
save_model : str, optional
If specified, a copy of the model from the final training epoch will be saved. For example,
save_model='my_model.h5'
. Typically used for continued training (default is None).save_weights : str, optional
If specified, a copy of the model weights from the final training epoch will be saved. For example,
save_weights='my_weights.h5'
(default is None).save_history : str, optional
If specified, the training history (accuracy and loss for training and testing) from each epoch will be saved to a CSV file. For example,
save_history='my_history.csv'
(default is None).check_point : str, optional
If specified, saves a running copy of the model corresponding to the lowest validation loss epoch. Each time a new low is reached, the previous best model is over-written by the new one. For example,
check_point='my_checkpoint.h5'
(default is None).check_point_weights_only : bool, optional
If True, only the model’s weights will be saved in the check point. Otherwise the full model is saved. Ignored if
check_point=False
(default is True).verbose : bool, optional
Option for verbose output.
Returns: self : CNN
Trained CNN.
-
predict
(images, batch_size=32, verbose=0)[source]¶ Predict classifications for an input image array
Parameters: images : numpy.ndarray
Array of grayscale, normalized images to be used for class predictions. Input shape = (n_image,n_row,n_cols,1)
batch_size : int, optional
Batch size to use for predictions (default is 32).
verbose : int, optional
Verbosity mode to use, 0 or 1 (default is 0).
Returns: numpy.ndarray
Array containing class probabilities for each image. The array output is ordered as follows: [n_images, p(worm), p(spot), p(track), p(noise)]
-