Breaking News: Grepper is joining You.com. Read the official announcement!

feature engineering

Innocent Iguana answered on January 13, 2024 Popularity 10/10 Helpfulness 2/10

answer feature engineering

feature engineering

Comment

Tip Innocent Iguana 1 GREPCC

- Create new features (eg: averaging, BMI etc )
- Visualize distribution with boxplot, pairplot of dataset to see if Transformation is necessary (eg: log transformation)
- Normalize/Standardize/Scale features
- Encoding : Convert categories into numeric data
    - One-hot encoding : Explainable features, create N columns for N categories
    - Dummy encoding : Necessary information without duplication, create N-1 columns for N categories
- Merge low frequent categorical values (uncommon categories) into one single category (eg: `other`)
- Binarise numeric values (eg: from `num_violations` to `violation_boolean`)
- Deal with missing values:
    - drop missing values that are beyond threshold (>30% of dataset)
    - fill completely random missing values (with mean, median, mode, `Other`, sorted next present value)
- Deal with outliers
- Validate numeric columns
    - remove characters from numeric data (eg: `$` or `,` sign for currency)
    - make sure the column is in proper datatype (eg: `float`, `int` etc)
- For text processing : Generate numeric features 
    1. Remove unwanted/non-letter characters 
    2. Standardize text : convert to lowercase / uppercase
    3. Generate Feature, Mean word length : average length of words in text = character_count / word_count
    4. Generate Feature, Bag of words : Word Count Vector = number of times a word appeared in a text
    5. Generate Feature, Normalized significance of words : Calculate TF-IDF = normalization of word vector (significance of word in a document compared to all words in all documents)
    6. Generate Feature, contextual n-gram significance of word sequence : Calculate TF-IDF = normalization of word vector (significance of word in a document compared to all words in all documents)

xxxxxxxxxx

- Create new features (eg: averaging, BMI etc )

- Visualize distribution with boxplot, pairplot of dataset to see if Transformation is necessary (eg: log transformation)

- Normalize/Standardize/Scale features

- Encoding : Convert categories into numeric data

    - One-hot encoding : Explainable features, create N columns for N categories

    - Dummy encoding : Necessary information without duplication, create N-1 columns for N categories

- Merge low frequent categorical values (uncommon categories) into one single category (eg: `other`)

- Binarise numeric values (eg: from `num_violations` to `violation_boolean`)

- Deal with missing values:

    - drop missing values that are beyond threshold (>30% of dataset)

    - fill completely random missing values (with mean, median, mode, `Other`, sorted next present value)

- Deal with outliers

- Validate numeric columns

    - remove characters from numeric data (eg: `$` or `,` sign for currency)

    - make sure the column is in proper datatype (eg: `float`, `int` etc)

- For text processing : Generate numeric features

    1. Remove unwanted/non-letter characters

    2. Standardize text : convert to lowercase / uppercase

    3. Generate Feature, Mean word length : average length of words in text = character_count / word_count

    4. Generate Feature, Bag of words : Word Count Vector = number of times a word appeared in a text

    5. Generate Feature, Normalized significance of words : Calculate TF-IDF = normalization of word vector (significance of word in a document compared to all words in all documents)

    6. Generate Feature, contextual n-gram significance of word sequence : Calculate TF-IDF = normalization of word vector (significance of word in a document compared to all words in all documents)

Popularity 10/10 Helpfulness 2/10 Language whatever

Source: Grepper

Tags: feature-engineering whatever

Link to this answer
Share Copy Link

Contributed on Jan 13 2024

Innocent Iguana

0 Answers Avg Quality 2/10

feature engineering

Contents

More Related Answers

feature engineering

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.