From cslt Wiki
Revision as of 01:58, 22 August 2019 by Cslt
- Collect audio data of 1,000 Chinese celebrities.
- Automatically clip videoes through a pipeline including face detection, face recognition, speaker validation and speaker diarization.
- Create a database.
- Augment the database to 10,000 people.
- Build a model between SyncNet and Speaker_Diarization based on LSTM, which can learn the relationship of them.
- Tensorflow, PyTorch, Keras, MxNet 实现
- 检测、识别人脸的RetinaFace和ArcFace模型，说话人识别的SyncNet模型，Speaker Diarization的UIS-RNN模型
- Deng et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", 2018, 
- Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition", 2018, 
- Liu et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition", 2017
- Zhong et al., "GhostVLAD for set-based face recognition", 2018. link
- Chung et al., "Out of time: automated lip sync in the wild", 2016.link