利用feather快速处理大数据

2019-04-14 19:04发布

Feather是一个快速、轻量级的存储框架,可以在应用在pandas的Dataframe数据结构中。 读写数据 import feather import pandas as pd def read_csv_feature(file_in): # 读 f = open(file_in, encoding='utf-8') reader = pd.read_csv(f, sep=',',iterator=True) loop = True chunkSize= 10000 chunks = [] while loop: try: chunk = reader.get_chunk(chunkSize) chunks.append(chunk) except StopIteration: loop = False print('Iteration is stopped') df = pd.concat(chunks, ignore_index=True) return df def write_csv_feature(file_in, file_out): # 写 df = read_csv_feature(file_in) print(df.count()) feather = feather.write_dataframe(df, file_out)