Exploration of a ViT-based multimodal approach to Vehicle Accident Detection

Multimodal Deep Learning (MMDL) has emerged as a potent framework for synthesizing information from diverse data sources, enhancing the capability of models to understand and predict complex phenomena. Particularly, Vision Transformers (ViT) have shown promising results in processing visual data alo...

Full description

Autores:: Ríos Pérez, Jesús David

Tipo de recurso:

Fecha de publicación:: 2024

Institución:: Universidad del Magdalena

Repositorio:: Repositorio Unimagdalena

Idioma:: eng

id	UNIMAGDALE_c117eeb271af41a8cc250b22f92c3520
oai_identifier_str	oai:repositorio.unimagdalena.edu.co:123456789/21215
network_acronym_str	UNIMAGDALE
network_name_str	Repositorio Unimagdalena
repository_id_str
dc.title.none.fl_str_mv	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
dc.title.alternative.none.fl_str_mv	Exploración de un enfoque multimodal basado en ViT para la Detección de Accidentes Vehiculares
title	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
spellingShingle	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection Multimodal, Machine Learning, Data Fusion, Deep Learning. Multimodalidad, Aprendizaje de máquinas, Fusión de datos, Aprendizaje profundo.
title_short	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
title_full	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
title_fullStr	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
title_full_unstemmed	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
title_sort	Exploration of a ViT-based multimodal approach to Vehicle Accident Detection
dc.creator.fl_str_mv	Ríos Pérez, Jesús David
dc.contributor.advisor.none.fl_str_mv	Sánchez Torres, Germán Henriquez Miranda, Carlos Nelson
dc.contributor.author.none.fl_str_mv	Ríos Pérez, Jesús David
dc.contributor.sponsor.none.fl_str_mv	Grupo de investigación y Desarrollo en Sistemas y Computación (GIDSYC)
dc.subject.none.fl_str_mv	Multimodal, Machine Learning, Data Fusion, Deep Learning. Multimodalidad, Aprendizaje de máquinas, Fusión de datos, Aprendizaje profundo.
topic	Multimodal, Machine Learning, Data Fusion, Deep Learning. Multimodalidad, Aprendizaje de máquinas, Fusión de datos, Aprendizaje profundo.
description	Multimodal Deep Learning (MMDL) has emerged as a potent framework for synthesizing information from diverse data sources, enhancing the capability of models to understand and predict complex phenomena. Particularly, Vision Transformers (ViT) have shown promising results in processing visual data alongside other modalities for comprehensive analysis. This study aims to investigate the integration of MMDL and ViT in the context of traffic accident detection, addressing the critical need for advanced predictive models in this domain. Through a literature review, we assess the current landscape of MMDL applications, and highlight the evolution and challenges of multimodal learning. Building on these insights, we propose a novel MMDL architecture designed to leverage video, audio, and metadata for accurate and timely accident detection. Our methodology combines a structured review of recent MMDL research with a theoretical approach to architecture design, emphasizing the fusion of multimodal data through ViT. The review adheres to established guidelines for systematic reviews, focusing on advancements from 2019 to 2023, while the architecture design is grounded in a thorough analysis of modalities relevant to traffic incidents. The main contributions include a taxonomy of MMDL methods and a ViT-based architecture for enhancing traffic safety systems. Integrating multimodal data through advanced deep learning models can improves the prediction accuracy of traffic accident detection. This research underscores the potential of MMDL and ViT in developing robust, real-time monitoring systems, marking a step forward in the application of artificial intelligence for public safety and smart city initiatives.
publishDate	2024
dc.date.accessioned.none.fl_str_mv	2024-07-11T13:43:30Z
dc.date.available.none.fl_str_mv	2024-07-11T13:43:30Z
dc.date.issued.none.fl_str_mv	2024
dc.date.submitted.none.fl_str_mv	2024
dc.type.none.fl_str_mv	bachelorThesis
dc.type.coar.fl_str_mv	http://purl.org/coar/resource_type/c_7a1f
dc.identifier.uri.none.fl_str_mv	https://repositorio.unimagdalena.edu.co/handle/123456789/21215
url	https://repositorio.unimagdalena.edu.co/handle/123456789/21215
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.none.fl_str_mv	Acceso Abierto
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.accessrights.none.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.cc.none.fl_str_mv	Acceso Abierto
dc.rights.creativecommons.spa.fl_str_mv	atribucionnocomercialcompartir
rights_invalid_str_mv	Acceso Abierto atribucionnocomercialcompartir http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	text
dc.publisher.none.fl_str_mv	Universidad del Magdalena
dc.publisher.department.none.fl_str_mv	Facultad de Ingeniería
dc.publisher.program.none.fl_str_mv	Ingeniería de Sistemas
publisher.none.fl_str_mv	Universidad del Magdalena
institution	Universidad del Magdalena
bitstream.url.fl_str_mv	https://repositorio.unimagdalena.edu.co/bitstreams/22c6baa0-56cd-4bb7-a9cc-d52044820d9c/download https://repositorio.unimagdalena.edu.co/bitstreams/89856b75-2e7a-4d0e-aa14-9eb0039fb813/download https://repositorio.unimagdalena.edu.co/bitstreams/72e9e210-fbac-4e2d-9c39-e529cb3f88ff/download
bitstream.checksum.fl_str_mv	6742e99f1bfa1457ec7e607813b4ddee 55a0e8f56af35c7d44385ed7d87efd81 03de826a7ba30b30f95ba9233c6ed790
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional UniMagdalena
repository.mail.fl_str_mv	repositorio@unimagdalena.edu.co
_version_	1836142541696663552
spelling	Sánchez Torres, GermánHenriquez Miranda, Carlos NelsonRíos Pérez, Jesús DavidIngeniero (a) de SistemasGrupo de investigación y Desarrollo en Sistemas y Computación (GIDSYC)2024-07-11T13:43:30Z2024-07-11T13:43:30Z20242024Multimodal Deep Learning (MMDL) has emerged as a potent framework for synthesizing information from diverse data sources, enhancing the capability of models to understand and predict complex phenomena. Particularly, Vision Transformers (ViT) have shown promising results in processing visual data alongside other modalities for comprehensive analysis. This study aims to investigate the integration of MMDL and ViT in the context of traffic accident detection, addressing the critical need for advanced predictive models in this domain. Through a literature review, we assess the current landscape of MMDL applications, and highlight the evolution and challenges of multimodal learning. Building on these insights, we propose a novel MMDL architecture designed to leverage video, audio, and metadata for accurate and timely accident detection. Our methodology combines a structured review of recent MMDL research with a theoretical approach to architecture design, emphasizing the fusion of multimodal data through ViT. The review adheres to established guidelines for systematic reviews, focusing on advancements from 2019 to 2023, while the architecture design is grounded in a thorough analysis of modalities relevant to traffic incidents. The main contributions include a taxonomy of MMDL methods and a ViT-based architecture for enhancing traffic safety systems. Integrating multimodal data through advanced deep learning models can improves the prediction accuracy of traffic accident detection. This research underscores the potential of MMDL and ViT in developing robust, real-time monitoring systems, marking a step forward in the application of artificial intelligence for public safety and smart city initiatives.Submitted by Jesús Rios Perez (jesusriosdp@unimagdalena.edu.co) on 2024-05-14T16:46:12Z workflow start=Step: reviewstep - action:claimaction No. of bitstreams: 1 Exploration of a ViT-based multimodal approach.pdf: 1872021 bytes, checksum: 6742e99f1bfa1457ec7e607813b4ddee (MD5)Step: reviewstep - action:reviewaction Rejected by Programa de Ingeniería de Sistemas Programa de Ingeniería de Sistemas(ingsistemas@unimagdalena.edu.co), reason: Favor agregar la licencia de publicación en formato PDF. Por favor. on 2024-07-02T19:32:46Z (GMT)Submitted by Jesús Rios Perez (jesusriosdp@unimagdalena.edu.co) on 2024-07-02T20:59:41Z workflow start=Step: reviewstep - action:claimaction No. of bitstreams: 1 Exploration of a ViT-based multimodal approach.pdf: 1872021 bytes, checksum: 6742e99f1bfa1457ec7e607813b4ddee (MD5)Step: reviewstep - action:reviewaction Rejected by Programa de Ingeniería de Sistemas Programa de Ingeniería de Sistemas(ingsistemas@unimagdalena.edu.co), reason: Favor agregar la licencia de publicación BI_F12_Formato_Licencia_Publicacion_Trabajos_Grado que se encuentra en el siguiente enlace: https://unimagdalena.edu.co/Content/DocumentosSubItems/BI_F12_Formato_Licencia_Publicacion_Trabajos_Grado.docx En formato PDF. on 2024-07-02T21:09:22Z (GMT)Submitted by Jesús Rios Perez (jesusriosdp@unimagdalena.edu.co) on 2024-07-03T01:16:37Z workflow start=Step: reviewstep - action:claimaction No. of bitstreams: 2 Exploration of a ViT-based multimodal approach.pdf: 1872021 bytes, checksum: 6742e99f1bfa1457ec7e607813b4ddee (MD5) BI_F12_Formato_Licencia_Publicacion_Trabajos_Grado jesus.pdf: 549657 bytes, checksum: 55a0e8f56af35c7d44385ed7d87efd81 (MD5)Step: reviewstep - action:reviewaction Approved for entry into archive by Programa de Ingeniería de Sistemas Programa de Ingeniería de Sistemas(ingsistemas@unimagdalena.edu.co) on 2024-07-03T14:29:32Z (GMT)Step: editstep - action:editaction Approved for entry into archive by Cristhian Camilo Suarez Ibañez(csuarezi@unimagdalena.edu.co) on 2024-07-11T13:43:30Z (GMT)Made available in DSpace on 2024-07-11T13:43:30Z (GMT). No. of bitstreams: 2 Exploration of a ViT-based multimodal approach.pdf: 1872021 bytes, checksum: 6742e99f1bfa1457ec7e607813b4ddee (MD5) BI_F12_Formato_Licencia_Publicacion_Trabajos_Grado jesus.pdf: 549657 bytes, checksum: 55a0e8f56af35c7d44385ed7d87efd81 (MD5) Previous issue date: 2024texthttps://repositorio.unimagdalena.edu.co/handle/123456789/21215Universidad del MagdalenaFacultad de IngenieríaIngeniería de SistemasAcceso Abiertoinfo:eu-repo/semantics/openAccessAcceso Abiertoatribucionnocomercialcompartirhttp://purl.org/coar/access_right/c_abf2Multimodal, Machine Learning, Data Fusion, Deep Learning.Multimodalidad, Aprendizaje de máquinas, Fusión de datos, Aprendizaje profundo.Exploration of a ViT-based multimodal approach to Vehicle Accident DetectionExploración de un enfoque multimodal basado en ViT para la Detección de Accidentes VehicularesbachelorThesishttp://purl.org/coar/resource_type/c_7a1fengORIGINALExploration of a ViT-based multimodal approach.pdfExploration of a ViT-based multimodal approach.pdfMultimodal Deep Learning (MMDL) has emerged as a potent framework for synthesizing information from diverse data sources, enhancing the capability of models to understand and predict complex phenomena. Particularly, Vision Transformers (ViT) have shown promising results in processing visual data alongside other modalities for comprehensive analysis. This study aims to investigate the integration of MMDL and ViT in the context of traffic accident detection, addressing the critical need for advanced predictive models in this domain. Through a literature review, we assess the current landscape of MMDL applications, and highlight the evolution and challenges of multimodal learning. Building on these insights, we propose a novel MMDL architecture designed to leverage video, audio, and metadata for accurate and timely accident detection. Our methodology combines a structured review of recent MMDL research with a theoretical approach to architecture design, emphasizing the fusion of multimodal data through ViT. The review adheres to established guidelines for systematic reviews, focusing on advancements from 2019 to 2023, while the architecture design is grounded in a thorough analysis of modalities relevant to traffic incidents. The main contributions include a taxonomy of MMDL methods and a ViT-based architecture for enhancing traffic safety systems. Integrating multimodal data through advanced deep learning models can improves the prediction accuracy of traffic accident detection. This research underscores the potential of MMDL and ViT in developing robust, real-time monitoring systems, marking a step forward in the application of artificial intelligence for public safety and smart city initiatives.application/pdf1872021https://repositorio.unimagdalena.edu.co/bitstreams/22c6baa0-56cd-4bb7-a9cc-d52044820d9c/download6742e99f1bfa1457ec7e607813b4ddeeMD51BI_F12_Formato_Licencia_Publicacion_Trabajos_Grado jesus.pdfBI_F12_Formato_Licencia_Publicacion_Trabajos_Grado jesus.pdfapplication/pdf549657https://repositorio.unimagdalena.edu.co/bitstreams/89856b75-2e7a-4d0e-aa14-9eb0039fb813/download55a0e8f56af35c7d44385ed7d87efd81MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82484https://repositorio.unimagdalena.edu.co/bitstreams/72e9e210-fbac-4e2d-9c39-e529cb3f88ff/download03de826a7ba30b30f95ba9233c6ed790MD52123456789/21215oai:repositorio.unimagdalena.edu.co:123456789/212152024-10-02 19:01:24.958https://repositorio.unimagdalena.edu.coRepositorio Institucional UniMagdalenarepositorio@unimagdalena.edu.coTElDRU5DSUEgREUgUFVCTElDQUNJw5NOIERFIFJFR0lTVFJPIAogPGJyLz4KRUwgQVVUT1IsIG1hbmlmaWVzdGEgcXVlIGxhIG9icmEgb2JqZXRvIGRlIGxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gZXMgb3JpZ2luYWwgeSBsYSByZWFsaXrDsyBzaW4gdmlvbGFyIG8gdXN1cnBhciBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcywgcG9yIGxvIHRhbnRvLCBsYSBvYnJhIGVzIGRlIGV4Y2x1c2l2YSBhdXRvcsOtYSB5IHRpZW5lIGxhIHRpdHVsYXJpZGFkIHNvYnJlIGxhIG1pc21hLiBQQVLDgUdSQUZPOiBFbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgRUwgQVVUT1IsIGFzdW1pcsOhIHRvZGEgbGEgcmVzcG9uc2FiaWxpZGFkLCB5IHNhbGRyw6EgZW4gZGVmZW5zYSBkZSBsb3MgZGVyZWNob3MgYXF1w60gYXV0b3JpemFkb3M7IHBhcmEgdG9kb3MgbG9zIGVmZWN0b3MgbGEgdW5pdmVyc2lkYWQgYWN0w7phIGNvbW8gdW4gdGVyY2VybyBkZSBidWVuYSBmZS4gCiA8YnIvPgpFTCBBVVRPUiwgYXV0b3JpemEgYSBMQSBVTklWRVJTSURBRCBERUwgTUFHREFMRU5BLCBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBsZXkgMjMgZGUgMTk4MiwgbGV5IDQ0IGRlIDE5OTMsIGRlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCBsYSBwdWJsaWNhY2nDs24gZGUgbG9zIG1ldGFkYXRvcyBhcXXDrSByZWdpc3RyYWRvcyBwYXJhIGZpbmVzIGFjYWTDqW1pY29zIGUgaW52ZXN0aWdhdGl2b3MuIEVuIGZ1bmNpw7NuIGRlIGxvIGN1YWwsIGFsIGZpcm1hciB5IGVudmlhciBlc3RhIGxpY2VuY2lhLCBFTCBBVVRPUiBvdG9yZ2EgYSBMQSBVTklWRVJTSURBRCBERUwgTUFHREFMRU5BIGVsIGRlcmVjaG8gTk8gRVhDTFVTSVZPIGRlIGFsbWFjZW5hciwgcmVwcm9kdWNpciwgdHJhZHVjaXIgeSBkaXZ1bGdhciBsb3MgbWV0YWRhdG9zIGFxdcOtIHJlZ2lzdHJhZG9zIGVuIGZvcm1hdG8gaW1wcmVzbywgZWxlY3Ryw7NuaWNvIHkgZW4gY3VhbHF1aWVyIG1lZGlvLCBpbmNsdXllbmRvLCBwZXJvIG5vIGxpbWl0YWRvIGEgYXVkaW8gbyB2w61kZW87IHkgYWNlcHRhIHF1ZSBsYSBVTklWRVJTSURBRCBERUwgTUFHREFMRU5BIHB1ZWRlLCBTSU4gTU9ESUZJQ0FSIEVMIENPTlRFTklETyB5IFJFU1BFVEFOVE8gTE9TIERFUkVDSE9TIE1PUkFMRVMsIGRpc3BvbmVyIGRlIGxhIHJlY3VwZXJhY2nDs24gZGUgaW5mb3JtYWNpw7NuIHBvciBwYXJ0ZSBkZSBjb3NlY2hhZG9yZXMgZGUgaW5mb3JtYWNpw7NuIGF2YWxhZG9zIHBvciBsYSBVTlZJRVJTSURBRCBERUwgTUFHREFMRU5BIHkgcXVlIGVzdG9zIG1ldGFkYXRvcyBzZWFuIHJlY3VwZXJhYmxlcyB5IGFjY2VzaWJsZXMgY29uIGZpbmVzIGFjYWTDqW1pY29zIGUgaW52ZXN0aWdhdGl2b3MuIExvIGFudGVyaW9yLCBTSU4gUVVFIEVTVE8gQ09OTExFVkUgQSBRVUUgTEEgVU5JVkVSU0lEQUQgRVNUw4kgT0JMSUdBREEgQSBCUklOREFSIENPTVBFTlNBU0nDk04gTU9ORVRBUklBIEFMIEFVVE9SIHBvciBhY3RpdmlkYWRlcyBkZSBkaXZ1bGdhY2nDs24geSBsb3MgcG9zaWJsZXMgYmVuZWZpY2lvcyBlY29uw7NtaWNvcyBxdWUgZXN0YSBkaXZ1bGdhY2nDs24gcHVlZGEgZ2VuZXJhciBwYXJhIGxhIHVuaXZlcnNpZGFkLiA8YnIvPgotLS0tLS0tLS0tICAKPGJyLz4gClBPTMONVElDQSBERSBUUkFUQU1JRU5UTyBERSBEQVRPUyBQRVJTT05BTEVTLiAgCiA8YnIvPgpEZWNsYXJvIHF1ZSBhdXRvcml6byBwcmV2aWEgeSBkZSBmb3JtYSBpbmZvcm1hZGEgZWwgdHJhdGFtaWVudG8gZGUgbWlzIGRhdG9zIHBlcnNvbmFsZXMgcG9yIHBhcnRlIGRlIGxhIFVOSVZFUlNJREFEIERFTCBNQUdEQUxFTkEgcGFyYSBmaW5lcyBhY2Fkw6ltaWNvcyB5IGVuIGFwbGljYWNpw7NuIGRlIGNvbnZlbmlvcyBjb24gdGVyY2Vyb3MgbyBzZXJ2aWNpb3MgY29uZXhvcyBjb24gYWN0aXZpZGFkZXMgcHJvcGlhcyBkZSBsYSBhY2FkZW1pYSwgY29uIGVzdHJpY3RvcyBjdW1wbGltaWVudG9zIGRlIGxvcyBwcmluY2lwaW9zIGRlIGxleSAxNTgxIGRlIDIwMTIuIERlIGlndWFsIGZvcm1hIGVuIGZ1bmNpw7NuIGRlbCBjb3JyZWN0byBlamVyY2ljaW8gZGUgbWkgZGVyZWNobyBkZSBoYWJlYXMgZGF0YSBwdWVkbyBlbiBjdWFscXVpZXIgbW9tZW50bywgcHJldmlhIGlkZW50aWZpY2FjacOzbiwgc29saWNpdGFyIGxhIGNvbnN1bHRhLCBjb3JyZWNjacOzbiB5IHN1cHJlc2nDs24gZGUgbWlzIGRhdG9zIHBvciBtZWRpbyBkZSBjb211bmljYWNpw7NuIG9maWNpYWwgZGlyaWdpZGEgYSBsYSBVTklWRVJTSURBRCBERUwgTUFHREFMRU5BLiAK

Exploration of a ViT-based multimodal approach to Vehicle Accident Detection

Publicaciones similares