https://mccormickml.com/2019/07/22/BERT-fine-tuning/
You will need to make a couple of changes to the code to get this to work optimally:
1. When creating the BERT tokenizer with tokenizer.encode_plus(), you must explicitly set truncation=True
2. Set the number of epochs to 3 (4 is too many and tends to overfit)
Hope this is useful - it should be possible to modify this code to work with your own data by changing the dataframe columns and labels as appropriate.
For quicker inferencing at run-time, you can modify the code to work with the smaller DistillBERT model, also from huggingface. Or one of the other BERT-type models for sequence classification, e.g
https://huggingface.co/transformers/v2.2.0/model_doc/distilbert.html (DistilBertForSequenceClassification, DistilBertConfig, DistilBertTokenizer) - tokenizer/model is 'distilbert-base-uncased'
https://huggingface.co/transformers/v2.2.0/model_doc/roberta.html (RobertaForSequenceClassification, RobertaConfig, RobertaTokenizer) - tokenizer/model is 'roberta-base'
https://huggingface.co/transformers/v2.2.0/model_doc/albert.html (AlbertForSequenceClassification, AlbrtConfig, AlbertTokenizer) - tokenizer/model is 'albert-base-v2'
Phil
On Sat, Aug 29, 2020 at 10:02 AM s.z. aftabi <s.z.aftabi at gmail.com> wrote:
> Dear all,
>
> I'm already working on a two sentence classification task.
> I aim at using BERT as an embedding layer with a neural network on top (NN
> with more than one layer) then train the NN model and also finetune BERT,
> both with respect to the loss of classification.
> I appreciate your helping me find good reference or implementation about
> how to do that?
>
> Best regards,
> S.Zahra
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 5123 bytes
Desc: not available
URL: <https://mailman.uib.no/public/corpora/attachments/20200829/88d8704a/attachment.txt>