Back

EinsteinDB and MilevaDB

5/4/2022

Introduction

# **Feature Engineering**
#
#
# *   Use all attributes except `tableName`, `isPrimaryKey` and `numAttributes`. For example, we can use a vectorizer to transform categorical values into numerical ones. However, this would not be necessary since there are no missing values in any column/feature/attribute/value. Thus we can simply use the raw data as is without any transformation needed! :)
#
# *   We'll split our dataset into train and test sets where 80% would be dedicated for training purposes and 20% for testing purposes using stratified sampling over classes so that both train and test sets have similar percentage distribution between target classes (0’s vs 1’s). However, before doing that let us first analyze how many instances exist per class so that we can potentially reduce our overall dataset size if needed; i.e., if one class contains 90% of all samples then randomly removing 10% from another class won't yield good results because most likely such class had very few samples compared to its larger counterpart(s) which contributed towards its large share in terms of total number instances contained within such smaller class! Therefore instead take random samples evenly distributed across each instance type! Or even better sample 30k rows at random uniformly distributed across each instance type...to do later :)

Help Icon

FAQs

A short list of frequently asked questions  and tips about using EinsteinDB.

HTML Tag icon

Explore developer docs

Ensure all of your information is up to date and that you stay efficient and in the loop.

Try EinsteinDB— it's free

The World's First Relativistic Linearizable SQL-Agnostic soon-to-be NoCode NoDBA hybrid htap