Pandas reader framework

Framework for converting file to dataframe

Scenario

The purpose of this script program is to extract the dataframe that is pre-processed. What I want to expect from this program is complying with scenario like below.

Read file and convert to pandas.DataFrame.
Changing column name that is used by domain or database.
Changing value from the column.
Filtering rows

To approach this purpose, you don't need to know about the pandas api. You just need to declare Field in the Model class, and get the result by executing main.py.

Guide

Setup data file

Setup data file which is located at './sources/'

Then, write the code on snapshot.py

Import module

from pandas_reader.model import Model

Use case

Inherit Model class, and define static variables(instance of Field) in Model.

class Major(Model):
    id = Field(index=True, auto_increment=True)
    name = Field("학부_과(전공)명")
    univ = Field("학교명")
    department = Field("단과대학명")
    investigation_year = Field("조사년도")
    sido_code = Field("지역", change=change_location_to_sido_code)
    status = Field("학과상태", change=change_status)
    std_clsf_name = Field("표준분류대계열")

Register Model instance and filename in configs.py

from implement import Student

MODEL_INSTANCE = Student()
FILENAME = "students.csv"

FILETYPE = 'csv'
SOURCE_PATH = 'sources'
PANDAS_READ_OPTIONS = {
    'encoding': "cp949"
}

Run script

python main.py

You can get the dataframe result.

        name            univ  college  investigation_year sido_code  status std_clsf_name
1      멀티미디어통신학과        ICT폴리텍대학  단과대구분없음                2022        31  ACTIVE          공학계열
2        모바일통신학과        ICT폴리텍대학  단과대구분없음                2022        31  DELETE          공학계열
3        스마트통신학과        ICT폴리텍대학  단과대구분없음                2022        31  ACTIVE          공학계열
4         이동통신학과        ICT폴리텍대학  단과대구분없음                2022        31  ACTIVE          공학계열
5         정보보안학과        ICT폴리텍대학  단과대구분없음                2022        31  ACTIVE          공학계열
...          ...             ...      ...                 ...       ...     ...           ...
49625      기독교학과  횃불트리니티신학대학원대학교  단과대구분없음                2022        11  ACTIVE        인문사회계열
49626       목회학과  횃불트리니티신학대학원대학교  단과대구분없음                2022        11  ACTIVE        인문사회계열
49627        신학과  횃불트리니티신학대학원대학교  단과대구분없음                2022        11  ACTIVE        인문사회계열
49628     예배음악학과  횃불트리니티신학대학원대학교  단과대구분없음                2022        11  ACTIVE        인문사회계열
49629      일반신학과  횃불트리니티신학대학원대학교  단과대구분없음                2022        11  ACTIVE        인문사회계열

[49629 rows x 7 columns]

Document

Model

Manager uses Model to change columns and values by checking the Field element. So you need to check model instance properties.

meta

Model information should follow the declaration

get_fields

All fields are instance of Field
Fields size is as same as meta.get_size()

get_colnames

column names are matched with Field variable names

Field

Field is constructed in Model static variables. So you have to check Field constructor arguments.

target: The target of the column name from the origin file. Default value is None
change: It is the function that has the target parameter that returns converted value. Default value is None

def change_value(target):
    write some codes
    ...

Field("Column1", change=change_value)

index: If index = True, auto_increment, generator is referenced. Default value is False.
auto_increment: If True, it ignores target column name and replace the auto_increment integer.
generator: If you have random generated function, you can set this argument to create random ID.

def random_generator():
    write some codes
    ...

Field("Column1", generator=random_generator)

filter: Use this argument if you want to get values only follow some condition. Filter is applied after the change function executed.

Manager

Get dataframe from file

If you want to change the default setting, just do it. the keyword argument of _get_dataframe function is allied to pandas.

Manager

There is a bunch of steps to pre-processing DataFrame data structure

Remove columns that is not matched
Replace column name
Add index if model has index_field
Replace value
Filter rows

fetch

fetch function is used for main.py. It can access Manager instance, and get pre-processed dataframe data structure

Config

The config file is what you setting on your file source, file name, or your file types.

Requirements

The requirement settings on config.py is like below

MODEL_INSTANCE = None
FILENAME = None

The reference is here

Default

The default settings on config.py is like below

FILETYPE = 'csv'
SOURCE_PATH = 'sources'
PANDAS_READ_OPTIONS = {}

FILETYPE

Here is the enumerations of file type that provide you to read.

csv
excel
json

SOURCE_PATH

On project root generated file should save on the source path directory. Default path is 'sources'

PANDAS_READ_OPTIONS

You can select keyword argument options within pandas io API

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
pandas_reader		pandas_reader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
snapshot.py		snapshot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pandas reader framework

Scenario

Guide

Setup data file

Import module

Use case

Run script

Document

Model

Field

Manager

Config

Requirements

Default

About

Releases

Packages

Languages

License

jinhyeok15/pandas_reader_framework

Folders and files

Latest commit

History

Repository files navigation

Pandas reader framework

Scenario

Guide

Setup data file

Import module

Use case

Run script

Document

Model

Field

Manager

Config

Requirements

Default

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages