跳转至

PyTorch DataLoader 加载数据

在 DataLoader 中指定 batch_size 后,可以将输入数据划分为多个 Batch,分批输入到网络中训练。

读取 npy 数据

Python
X_train = np.load("./X_train.npy")
y_train = np.load("./y_train.npy")
X_test = np.load("./X_test.npy")
y_test = np.load("./y_test.npy")
print("Training samples: ", X_train.shape[0])
print("Testing samples: ", X_test.shape[0])

image-20221217204909405

将数据转换为 tensor

Python
trainx = torch.from_numpy(np.array(X_train)).reshape(
    len(X_train), 1, 9, 30
)  # transform to tensor
trainy = torch.from_numpy(np.array(y_train)).reshape(
    len(y_train), 1
)  # label for regression
testx = torch.from_numpy(np.array(X_test)).reshape(len(X_test), 1, 9, 30)
testy = torch.from_numpy(np.array(y_test)).reshape(len(y_test), 1)
batch_size = 1000


class Factor_data(Dataset):
    def __init__(self, train_x, train_y):  # 默认输入的时候就已经是 tensor
        self.len = len(train_x)
        self.x_data = train_x
        self.y_data = train_y

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.len


# put into data loader
train_data = Factor_data(trainx, trainy)
train_loader = DataLoader(
    dataset=train_data, batch_size=batch_size, shuffle=False
)  # 不打乱数据集
test_data = Factor_data(testx, testy)
test_loader = DataLoader(
    dataset=test_data, batch_size=batch_size, shuffle=False
)  # 不打乱数据集

查看 train_loader 中的数据情况

Python
for data, label in train_loader:
    print(data.shape)
    print(label.shape)

image-20221217205038192

可以看到,数据总量刚好为 11825,且前 11 个 Batch 的 Size 都是 1000(因为指定了batch_size=1000),最后一个 Batch 的 Size 是 825。

评论