陌路茶色/

使用Estimator实现textcnn

貌似tensorflow2.x主推的是tf.keras,这里我先用tensorflow2.0实现estimator的textcnn,然后使用keras实现textcnn,最后再使用tensorflow1.x estimator实现textcnn(因为目前公司使用的是该版本)【主要是熟悉使用】

tensorflow2.x estimator 实现textcnn

TFRecord

tensorflow2.x keras实现textcnn

keras中的文本处理

tf.keras.preprocessing.text.Tokenizer
tf.keras.preprocessing.sequence.pad_sequences
其中定义了如何分割字符串,以及词典的大小,过滤的字符,是否使用char模式,oov等
如下所示,定义一个基于字符的tokenizer,其中oov指定为"<UNK>":

tokenizer=tf.keras.preprocessing.text.Tokenizer(char_level=True,oov_token="<UNK>")

喂数据给该tokenizer,可以增量喂入,会在后面添加新的词表

tokenizer.fit_on_texts(docs)
tokenizer.fit_on_texts(docs_test)

转为index

x=tokenizer.texts_to_sequences(docs)

padding
padding参数定义了在前面还是在后面padding,truncating参数定义了是对前面进行截断还是对后面进行截断

x_pad=tf.keras.preprocessing.sequence.pad_sequences(x,maxlen=30,padding='post',truncating='post')

单机多核训练

tf.distribute.MirroredStrategy
查看设备的个数:

strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

参考:Keras 的分布式训练

tensorflow2.0使用

summary

打印网络结构以及参数:Recurrent Neural Networks (RNN) with Keras

参考:How to Prepare Text Data for Deep Learning with Keras

Reference

https://crosleythomas.github.io/blog/posts/tf_start_to_finish/introduction
https://www.datacamp.com/community/tutorials/pickle-python-tutorial
https://docs.python.org/3/library/collections.html
https://www.tensorflow.org/tutorials/load_data/tfrecord
https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator
https://www.tensorflow.org/tutorials/distribute/multi_worker_with_estimator
https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams
https://pyformat.info/
https://www.tensorflow.org/tutorials/distribute/multi_worker_with_estimator
https://stackoverflow.com/questions/55081911/tensorflow-2-0-0-alpha0-tf-logging-set-verbosity
https://stackoverflow.com/questions/45353389/printing-extra-training-metrics-with-tensorflow-estimator
https://stackoverflow.com/questions/35911252/disable-tensorflow-debugging-information
https://blog.yyliu.net/remote-tensorboard/
https://www.tensorflow.org/api_docs/python/tf/estimator/add_metrics

留下一条评论

暂无评论