feat: kaggle refactor #489

you-n-g · 2024-11-15T10:05:39Z

Task: new kaggle mechanism; template from scratch

small size data

[[rdagent/scenarios/kaggle/tpl_ex/aerial-cactus-identification/main.py:235]]
[[rdagent/scenarios/kaggle/tpl_ex/aerial-cactus-identification/load_data.py:55]]

deprecated:[[rdagent/scenarios/kaggle/tpl_ex/aerial-cactus-identification/train.py:18]]
Sample data code:

from pathlib import Path
import pandas as pd
from rdagent.app.kaggle.conf import KAGGLE_IMPLEMENT_SETTING

def create_debug_data(competition = "new-york-city-taxi-fare-prediction", min_frac=0.05, min_num=100):
    # Define the competition name

    # Define the path to the CSV file
    csv_path = Path(KAGGLE_IMPLEMENT_SETTING.Local_data_path) / competition / "train.csv"

    # Define the path to the .full CSV file
    full_csv_path = csv_path.with_name("train.full.csv")

    # Check if the .full file exists
    if not full_csv_path.exists():
    # Load the CSV file
    df = pd.read_csv(csv_path)

    # Calculate the fraction to sample
    frac = max(min_frac, min_num / len(df))

    # Sample the data
    df_sampled = df.sample(frac=frac, random_state=1)

    # Save the sampled data to a new CSV file
    sampled_csv_path = csv_path.with_name("train_sampled.csv")
    df_sampled.to_csv(sampled_csv_path, index=False)

    # Rename the original file with .full
    csv_path.rename(full_csv_path)

    # Move the sampled data to replace the original one
    sampled_csv_path.rename(csv_path)

import fire
if __name__ == "__main__":
    fire.Fire(create_debug_data)

Config

To successfully run it, we temporary use the default kaggle image via pulling
Here is the example config from xiao

KG_LOCAL_DATA_PATH=
KG_IF_USING_MLE_DATA=True
 
KG_DOCKER_BUILD_FROM_DOCKERFILE=False
KG_DOCKER_IMAGE="gcr.io/kaggle-gpu-images/python:latest"
KG_DOCKER_DEFAULT_ENTRY="sh -c 'python main.py; sleep 200'"

TODO:

Align the path to kaggle would be better. replace the "kg_workspace"
unzip the internal content in the package

Description

Motivation and Context

How Has This Been Tested?

Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

Pipeline test:
Your own tests:

Types of changes

Fix bugs
Add new feature
Update documentation

📚 Documentation preview 📚: https://RDAgent--489.org.readthedocs.build/en/489/

It should stop on ~20 epoch and reach the end

…kaggle_refactor

…rain.py

…kaggle_refactor

you-n-g and others added 4 commits November 15, 2024 10:04

init trail

4581add

Add spec info

ee252ac

auto unzip mlebench prepared data for out scenario

c147be2

successfully run example

20125ac

you-n-g marked this pull request as draft November 19, 2024 03:42

you-n-g and others added 17 commits November 19, 2024 07:51

successfully run main

c426fd7

simplify load traing

7cdfa31

extract load_from_raw_data

a80bcf9

split the fies(still buggy)

ff8ac0c

It should stop on ~20 epoch and reach the end

some changes

39bd2dc

Fix bug to run example

d17b093

(success) until feature

49880aa

refine model and ensemble

7d46383

add metrics in ens.py

1acdb36

merge

c7313f7

update README & spec.md

dfea17f

ens change

91dc6a8

Merge branch 'kaggle_refactor' of github.com:microsoft/RD-Agent into …

ad879fa

…kaggle_refactor

fix ens bug

c0fb934

Delete rdagent/scenarios/kaggle/tpl_ex/aerial-cactus-identification/t…

133fa20

…rain.py

add template_path in KG_conf

ee95998

Merge branch 'kaggle_refactor' of github.com:microsoft/RD-Agent into …

feed191

…kaggle_refactor

XianBW marked this pull request as ready for review November 20, 2024 08:30

XianBW added 3 commits November 20, 2024 08:32

fix test kaggle

20f1f5a

CI

806b6bf

make test_import not check kaggle template codes

27b91eb

XianBW merged commit 1b057d0 into main Nov 20, 2024
8 checks passed

XianBW deleted the kaggle_refactor branch November 20, 2024 09:01

you-n-g mentioned this pull request Nov 15, 2024

chore(main): release 0.4.0 #454

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: kaggle refactor #489

feat: kaggle refactor #489

you-n-g commented Nov 15, 2024 •

edited by XianBW

Loading

feat: kaggle refactor #489

feat: kaggle refactor #489

Conversation

you-n-g commented Nov 15, 2024 • edited by XianBW Loading

Task: new kaggle mechanism; template from scratch

small size data

Config

TODO:

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

you-n-g commented Nov 15, 2024 •

edited by XianBW

Loading