Breaking News: Grepper is joining You.com. Read the official announcement!

pyspark train test split

Add Answer

Innocent Iguana answered on October 14, 2022 Popularity 6/10 Helpfulness 6/10

answer pyspark train test split

related how to split data into training and testing in pyspark

pyspark train test split

Comment

Tip Innocent Iguana 1 GREPCC

########### Splitting non-sequence data
train_df, test_df = df.randomSplit([0.8, 0.2], seed=42)

########### Splitting sequence data with inherent sequence (eg : Time Series)
# Find how many days our data spans
from pyspark.sql.functions import datediff
range_in_days = datediff(max_date, min_date) # Find the no of days beteen minimum and maximum date
# Find the date to split the dataset on
from pyspark.sql.functions import date_add
split_in_days = round(range_in_days * 0.8) # Find 80% date split point
split_date = date_add(min_date, split_in_days) # Add split point with minimum date to get the split date
# Split the data into 80% train, 20% test
train_df = df.where(df['DATE'] < split_date) # Use filtering with split date to take only training data
test_df = df.where(df['DATE'] >= split_date) # Use filtering with split date to take only testing data

xxxxxxxxxx

########### Splitting non-sequence data

train_df, test_df = df.randomSplit([0.8, 0.2], seed=42)

########### Splitting sequence data with inherent sequence (eg : Time Series)

# Find how many days our data spans

from pyspark.sql.functions import datediff

range_in_days = datediff(max_date, min_date) # Find the no of days beteen minimum and maximum date

# Find the date to split the dataset on

from pyspark.sql.functions import date_add

split_in_days = round(range_in_days * 0.8) # Find 80% date split point

split_date = date_add(min_date, split_in_days) # Add split point with minimum date to get the split date

# Split the data into 80% train, 20% test

train_df = df.where(df['DATE'] < split_date) # Use filtering with split date to take only training data

test_df = df.where(df['DATE'] >= split_date) # Use filtering with split date to take only testing data

Popularity 6/10 Helpfulness 6/10 Language python

Source: Grepper

Tags: pyspark python

Link to this answer
Share Copy Link

Contributed on Feb 27 2024

Innocent Iguana

0 Answers Avg Quality 2/10

Closely Related Answers

how to split data into training and testing in pyspark

Comment

Tip KK SK 1 GREPCC

# Split the data into training and testing

splits = df4.randomSplit([1.0, 2.0], seed=24)

xxxxxxxxxx

# Split the data into training and testing

splits = df4.randomSplit([1.0, 2.0], seed=24)

Popularity 7/10 Helpfulness 3/10 Language whatever

Source: spark.apache.org

Tags: training pyspark whatever testing tes

Link to this answer
Share Copy Link

Contributed on Oct 14 2022

KK SK

0 Answers Avg Quality 2/10

pyspark train test split

Contents

More Related Answers

pyspark train test split

Closely Related Answers

how to split data into training and testing in pyspark

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.