A machine learning project has a lot of flexibility and user control over it, but it is generally accepted that an ML workflow adopt the product life cycle below.
Business Case Definition
Before starting a machine learning project, it is best to step back and define the business problem at hand. It takes some finesse to match business problems with the appropriate algorithms, and the appropriate data. Often, business problems fall into one of the following categories.
The enterprise can produce incredible amounts of data from different systems. For use in the Infor Coleman ML platform, data should be managed and stored in the data lake.
Data preparation tasks can be numerous and time consuming. It is recommended that one be familiar with tidy data standards when it comes to cleaning data. It is also important to understand any caveats to the particular algorithm that you might use to process the data. Coleman has the following built in tools for data manipulation and preparation, as well as the ability to run python scripts to allow for complete customization when manipulating data.
|Select Columns||Select or exclude a subset of columns from the current dataset.|
|Remove Duplicates||Remove duplicates in selected features.|
|Construct Feature||Create a new feature out of the existing ones by using mathematical, logical, or casting operations.|
|Index Data||Transform categorical values into numeric for the selected columns. Each category will be assigned a number according to its occurrence in the data, highest occurrence having number 0.|
|Smooth Data||Remove noise from a dataset to allow natural patterns to stand out.|
|Split Data||Split the dataset into training data and test data by specifying the split ratio for the training dataset.|
|Scripting||Execute a customized Python script to perform an activity which is not available in the catalog.|
|Ingest to Data Lake||Ingest data to Infor Data Lake|
|One Hot Encoder||Transform categorical features into a binary matrix (vectors) to distinguish each categorical label. The vector consists of 0s in all cells, with the exception of a single 1 in a cell used uniquely to identify the label.|
|Feature Scaling||Scale features with varying magnitudes, units and range into normalized values.|
|Handle Missing Data||Replace missing values in selected features (with mean / mode / constant value / interpolation), or remove the entire row exceeding a selected ratio of missing data.|
|Target Encoder||Numerization of categorical variables via target – replaces the categorical variable with just one new numerical variable and replaces each category of the categorical variable with its corresponding probability of the target (if categorical) or average of the target (if numerical)|
|Edit Metadata||Select the Target label. Edit the metadata of the selected features by changing its data type, tagging the categorical values, changing the variable name or defining their machine learning type.|
|Balance Data||Balance the dataset using undersampling or oversampling methods.|
|Execute SQL||SQL operations (filter out data, join datasets, aggregate data etc.).|
Training a model requires the prepared dataset and the algorithm to be used in training. Supervised algorithms can be scored for accuracy using the train/test split functionality and the score and evaluate model blocks. The compare model block will allow for the training of multiple models to compare performance statistics.
Model Fitting & Tuning
Algorithm hyperparameters are available in each of the algorithm blocks. The specific parameters will be different depending on algorithm selection, and can details for each hyperparameter can be found in the documentation of the chosen algorithm.
In the quest, select the checkbox on activities desired for the deployed model and push the activities to the production quest. This quest can be deployed as an endpoint accessible via the ION API gateway.
These best practices offer essential guidance to enhance your processes. For a personalized and thorough implementation tailored to your needs, reach out to Infor Professional Services. Their expertise ensures optimal results for your unique challenges.