接下来利用tflite做量化,其实中间的数据流到底有没量化,我也不清楚。。。

一、数据准备

# 1.数据准备
import tensorflow as tf
import numpy as np

mnist = tf.keras.datasets.mnist
img_rows,img_cols = 28,28
(x_train_, y_train_), (x_test_, y_test_) = mnist.load_data()
x_train = x_train_.reshape(x_train_.shape[0],img_rows,img_cols,1)
x_test = x_test_.reshape(x_test_.shape[0],img_rows,img_cols,1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train = x_train / 256
x_test = x_test / 256
y_train_onehot = tf.keras.utils.to_categorical(y_train_)
y_test_onehot = tf.keras.utils.to_categorical(y_test_)

二、初始模型导入

model = tf.keras.Sequential()
model = tf.keras.models.load_model('models/mnist_tf2_fw.h5')
score = model.evaluate(x_test, y_test_onehot, verbose=0)
print('Test accuracy:', "{:.5f}".format(score[1]))

三、量化模型导入

interpreter= tf.lite.Interpreter(model_path="models/tflite_tf2_8.tflite")
#interpreter= tf.lite.Interpreter(model_path="models/tflite_tf2_dy.tflite")
#interpreter= tf.lite.Interpreter(model_path="models/tflite_tf2_16.tflite")
#interpreter= tf.lite.Interpreter(model_path="models/tflite_tf2_32.tflite")

interpreter.allocate_tensors()

input_details=interpreter.get_input_details()
output_details=interpreter.get_output_details()

#print(input_details)
#print(output_details)
y_lebal=y_test_onehot.argmax(1)

# 量化模型推理
pre = []
# tflite针对安卓端运用,每次只能推理一个数据
for i in range (len(x_test)):
    # .astpe 非常重要,找了很久才解决这个bug
    x_test1 = x_test[i].reshape(1,28,28,1).astype(np.float32)
    interpreter.set_tensor(input_details[0]['index'],x_test1)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
    pre.append(output_data.argmax())

temp = (pre==y_lebal)
acc=sum(temp)/10000
print(acc)
32bit 16bit 8bit 默认
98.500 98.500 98.440 98.440

这结果似乎和权重量化的差不多。数据流量化确实不太容易,需要更加底层、细粒度的操作。只有rram的训练有可能是4bit和8bit的混合,所以低精度数据流的训练还是挺重要的。

后期计划,做低比特、数据流量化的调用,看看别人的工作,为rram高性能计算打下理论基础。 github 代码

标签: 量化bit, keras, tflite

添加新评论