Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario

In pursuing the central theme of this Ph.D. thesis, which is effective web search, the author seeks through synergistic combination, to make the most of the different potentials of thematic indices, traditional web search engines, and meta web search engines, bypassing the weaknesses inherent in eac...

Full description

Autores:: Cobos Lozada, Carlos Alberto

Tipo de recurso:: Doctoral thesis

Fecha de publicación:: 2013

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

id	UNACIONAL2_aed7ce3479d50c3acfea29ef41d0e6ad
oai_identifier_str	oai:repositorio.unal.edu.co:unal/52281
network_acronym_str	UNACIONAL2
network_name_str	Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
title	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
spellingShingle	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario 0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Clustering search results Web clustering engine Taxonomies Ontologies Memetic algorithm Global-best harmony search Balanced Bayesian information criterion Cuckoo search Hyper-heuristic approach User modeling Meta-search engine Personalized information retrieval Semantic search engine Agrupación de resultados web Motor que agrupa documentos web Taxonomías Ontologías Algoritmos meméticos Mejor búsqueda armónica global Criterio bayesiano de información balanceado Búsqueda cucú Enfoque híper heurístico Modelamiento de usuario Meta buscador Recuperación de información personalizada Motor de búsqueda semántica
title_short	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
title_full	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
title_fullStr	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
title_full_unstemmed	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
title_sort	Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario
dc.creator.fl_str_mv	Cobos Lozada, Carlos Alberto
dc.contributor.author.spa.fl_str_mv	Cobos Lozada, Carlos Alberto
dc.contributor.spa.fl_str_mv	León Guzmán, Elizabeth
dc.subject.ddc.spa.fl_str_mv	0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering
topic	0 Generalidades / Computer science, information and general works 62 Ingeniería y operaciones afines / Engineering Clustering search results Web clustering engine Taxonomies Ontologies Memetic algorithm Global-best harmony search Balanced Bayesian information criterion Cuckoo search Hyper-heuristic approach User modeling Meta-search engine Personalized information retrieval Semantic search engine Agrupación de resultados web Motor que agrupa documentos web Taxonomías Ontologías Algoritmos meméticos Mejor búsqueda armónica global Criterio bayesiano de información balanceado Búsqueda cucú Enfoque híper heurístico Modelamiento de usuario Meta buscador Recuperación de información personalizada Motor de búsqueda semántica
dc.subject.proposal.spa.fl_str_mv	Clustering search results Web clustering engine Taxonomies Ontologies Memetic algorithm Global-best harmony search Balanced Bayesian information criterion Cuckoo search Hyper-heuristic approach User modeling Meta-search engine Personalized information retrieval Semantic search engine Agrupación de resultados web Motor que agrupa documentos web Taxonomías Ontologías Algoritmos meméticos Mejor búsqueda armónica global Criterio bayesiano de información balanceado Búsqueda cucú Enfoque híper heurístico Modelamiento de usuario Meta buscador Recuperación de información personalizada Motor de búsqueda semántica
description	In pursuing the central theme of this Ph.D. thesis, which is effective web search, the author seeks through synergistic combination, to make the most of the different potentials of thematic indices, traditional web search engines, and meta web search engines, bypassing the weaknesses inherent in each, when they are operating in isolation. A general taxonomy of knowledge, ontologies, and user information (user profile and user feedback) are synergistically combined, together with the clustering of web results in a meta search model that brings up for the user only those results (documents) of greatest relevance, thereby reducing the time spent by users on searches. The proposed model includes five main components. The first component is responsible for supporting the query expansion of the user based on the semantic relationship (extracted from ontologies that are organized in a taxonomic hierarchy) of the terms that each user has stored in their profile. The second component is responsible for search result acquisition from traditional web search engines (Google, Yahoo! and Bing). The third component is responsible for pre-processing documents and generating two representations of them, one based on the vector space model and another based on frequent phrases. The fourth component is responsible for cluster construction and labeling, for which there are three heuristic algorithms that perform clustering based on the vector space representation of the results, and labeling based on frequent phrase representation. The fifth component is responsible for visualization of the resulting clusters, which involves the presentation of search results organized into thematic groups (folders) and updating of the user profile based on the user feedback (relevant or not relevant). The cluster construction and labeling component is supported by three new heuristic algorithms based on the following global search strategies: global-best harmony search, cuckoo search and a genetic algorithm. The K-means algorithm is employed as a local search improvement strategy in each of the algorithms. A new fitness function, called Balanced Bayesian Information Criterion guides the evolution process of these algorithms and is proposed from the genetic programming approach. A hyper-heuristic framework is also presented and used to evaluate a wide set of heuristics that can be used to solve the problem of web result clustering. The evaluation process of the model and the algorithms is based on synthetic data sets (from traditional repositories) and answers provided by a real population of users. The evaluation is supported by traditional validation metrics from the information retrieval field (precision, recall, F-measure, accuracy, and fall-out) and from user satisfaction (utility of each cluster, precision of allocation of documents in each cluster and their order, quality of labels for each cluster, and the Subtopic Search Length under k document sufficiency - SSLk- measure used for assessing the ease with which the users can use the clustering results). The results obtained are compared against results delivered by other state of the art algorithms, among them Bisecting K-means, STC and Lingo.
publishDate	2013
dc.date.issued.spa.fl_str_mv	2013
dc.date.accessioned.spa.fl_str_mv	2019-06-29T13:57:07Z
dc.date.available.spa.fl_str_mv	2019-06-29T13:57:07Z
dc.type.spa.fl_str_mv	Trabajo de grado - Doctorado
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/doctoralThesis
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.coar.spa.fl_str_mv	http://purl.org/coar/resource_type/c_db06
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TD
format	http://purl.org/coar/resource_type/c_db06
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	https://repositorio.unal.edu.co/handle/unal/52281
dc.identifier.eprints.spa.fl_str_mv	http://bdigital.unal.edu.co/46605/
url	https://repositorio.unal.edu.co/handle/unal/52281 http://bdigital.unal.edu.co/46605/
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.ispartof.spa.fl_str_mv	Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Facultad de Ingeniería
dc.relation.references.spa.fl_str_mv	Cobos Lozada, Carlos Alberto (2013) Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario. Doctorado thesis, Universidad Nacional de Colombia.
dc.rights.spa.fl_str_mv	Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv	Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.mimetype.spa.fl_str_mv	application/pdf
institution	Universidad Nacional de Colombia
bitstream.url.fl_str_mv	https://repositorio.unal.edu.co/bitstream/unal/52281/1/299810.2013.pdf https://repositorio.unal.edu.co/bitstream/unal/52281/2/299810.2013.pdf.jpg
bitstream.checksum.fl_str_mv	cca7c420eb2f4e4f6f244966b38bd468 972a648513646152510a82edc5a607ab
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv	repositorio_nal@unal.edu.co
_version_	1814089483102978048
spelling	Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2León Guzmán, ElizabethCobos Lozada, Carlos Alberto062b3d97-2502-4b55-a7ca-349c629250f73002019-06-29T13:57:07Z2019-06-29T13:57:07Z2013https://repositorio.unal.edu.co/handle/unal/52281http://bdigital.unal.edu.co/46605/In pursuing the central theme of this Ph.D. thesis, which is effective web search, the author seeks through synergistic combination, to make the most of the different potentials of thematic indices, traditional web search engines, and meta web search engines, bypassing the weaknesses inherent in each, when they are operating in isolation. A general taxonomy of knowledge, ontologies, and user information (user profile and user feedback) are synergistically combined, together with the clustering of web results in a meta search model that brings up for the user only those results (documents) of greatest relevance, thereby reducing the time spent by users on searches. The proposed model includes five main components. The first component is responsible for supporting the query expansion of the user based on the semantic relationship (extracted from ontologies that are organized in a taxonomic hierarchy) of the terms that each user has stored in their profile. The second component is responsible for search result acquisition from traditional web search engines (Google, Yahoo! and Bing). The third component is responsible for pre-processing documents and generating two representations of them, one based on the vector space model and another based on frequent phrases. The fourth component is responsible for cluster construction and labeling, for which there are three heuristic algorithms that perform clustering based on the vector space representation of the results, and labeling based on frequent phrase representation. The fifth component is responsible for visualization of the resulting clusters, which involves the presentation of search results organized into thematic groups (folders) and updating of the user profile based on the user feedback (relevant or not relevant). The cluster construction and labeling component is supported by three new heuristic algorithms based on the following global search strategies: global-best harmony search, cuckoo search and a genetic algorithm. The K-means algorithm is employed as a local search improvement strategy in each of the algorithms. A new fitness function, called Balanced Bayesian Information Criterion guides the evolution process of these algorithms and is proposed from the genetic programming approach. A hyper-heuristic framework is also presented and used to evaluate a wide set of heuristics that can be used to solve the problem of web result clustering. The evaluation process of the model and the algorithms is based on synthetic data sets (from traditional repositories) and answers provided by a real population of users. The evaluation is supported by traditional validation metrics from the information retrieval field (precision, recall, F-measure, accuracy, and fall-out) and from user satisfaction (utility of each cluster, precision of allocation of documents in each cluster and their order, quality of labels for each cluster, and the Subtopic Search Length under k document sufficiency - SSLk- measure used for assessing the ease with which the users can use the clustering results). The results obtained are compared against results delivered by other state of the art algorithms, among them Bisecting K-means, STC and Lingo.Resumen. Esta tesis doctoral tiene como tema central la Búsqueda Web. En ésta se aprovecha las potencialidades de los índices temáticos, los buscadores Web tradicionales y los meta buscadores, en un modelo que evita las debilidades que cada uno de ellos tiene por separado, y permite con ello disminuir el tiempo invertido por los usuarios en las búsquedas web. Para lograr esto, se combina sinérgicamente una taxonomía general de conocimiento, ontologías de dominio específico, información del usuario y agrupación de resultados (documentos) web en un modelo de un meta buscador que presenta resultados más relevantes a las necesidades de información de los usuarios y de una forma mejor organizada. El modelo propuesto contempla cinco componentes principales. El primer componente es el encargado de soportar la expansión de la consulta del usuario, basado en la relación semántica (extraída de las ontologías que se organizan en una jerarquía taxonómica) de los términos que cada usuario ha almacenado en su perfil. El segundo componente se encarga de la adquisición de los resultados desde los buscadores web tradicionales (Google, Yahoo! y Bing). El tercer componente es responsable del pre-procesamiento de documentos y genera dos representaciones de los mismos, una basada en el modelo espacio vectorial y otra en frases frecuentes. El cuarto componente se encarga de la construcción de agrupaciones y etiquetado, para lo cual se cuenta con tres algoritmos heurísticos que realizan el agrupamiento basado en la representación espacio vectorial de los resultados y el etiquetado basado en una representación de frases frecuentes. El quinto componente se encarga de la visualización de resultados, lo que implica la presentación de los resultados de la búsqueda organizados en grupos temáticos (carpetas) y la actualización del perfil del usuario basado en la re-alimentación que éste registre sobre los resultados (relevantes o no relevantes). El componente de construcción de agrupaciones y etiquetado se soporta en tres nuevos algoritmos heurísticos basados en las siguientes estrategias de búsqueda global: la mejor búsqueda armónica global, la búsqueda cucú y un algoritmo genético. El algoritmo K-means se usa para optimizar localmente las soluciones en cada uno de los algoritmos. Una nueva función de aptitud denominada Criterio de Información Bayesiano Balanceado orienta el proceso evolutivo de estos algoritmos y fue propuesta desde un enfoque de programación genética. También se presenta el modelo de un entorno híperheurístico que sirve para evaluar un conjunto mucho más amplio de heurísticas que pueden ser usadas para resolver el problema de agrupación de resultados web. El proceso de evaluación del modelo y de los algoritmos se basa en conjuntos de datos sintéticos (de repositorios tradicionales) y en respuestas entregadas por una población real de usuarios. La evaluación se soporta en medidas tradicionales del área de recuperación de información (precisión, recuerdo, medida F, exactitud y fall-out) y de satisfacción de los usuarios (utilidad de cada grupo, organización de los resultados en los grupos, calidad de las etiquetas de los grupos y la medida de longitud de búsqueda de sub tópicos mínima para encontrar k documentos relevantes -SSLk-, usada para evaluar la facilidad con la que los usuarios usan los resultados del agrupamiento). Los resultados obtenidos se comparan con los resultados entregados por otros algoritmos del estado del arte, entre ellos: Bisecting K-means, STC y Lingo.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de IngenieríaFacultad de IngenieríaCobos Lozada, Carlos Alberto (2013) Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario. Doctorado thesis, Universidad Nacional de Colombia.0 Generalidades / Computer science, information and general works62 Ingeniería y operaciones afines / EngineeringClustering search resultsWeb clustering engineTaxonomiesOntologiesMemetic algorithmGlobal-best harmony searchBalanced Bayesian information criterionCuckoo searchHyper-heuristic approachUser modelingMeta-search enginePersonalized information retrievalSemantic search engineAgrupación de resultados webMotor que agrupa documentos webTaxonomíasOntologíasAlgoritmos meméticosMejor búsqueda armónica globalCriterio bayesiano de información balanceadoBúsqueda cucúEnfoque híper heurísticoModelamiento de usuarioMeta buscadorRecuperación de información personalizadaMotor de búsqueda semánticaModelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del UsuarioTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDORIGINAL299810.2013.pdfapplication/pdf3616269https://repositorio.unal.edu.co/bitstream/unal/52281/1/299810.2013.pdfcca7c420eb2f4e4f6f244966b38bd468MD51THUMBNAIL299810.2013.pdf.jpg299810.2013.pdf.jpgGenerated Thumbnailimage/jpeg5495https://repositorio.unal.edu.co/bitstream/unal/52281/2/299810.2013.pdf.jpg972a648513646152510a82edc5a607abMD52unal/52281oai:repositorio.unal.edu.co:unal/522812024-03-01 23:07:50.664Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co

Modelo de un Meta Buscador que Realiza Agrupación de Documentos Web, Enriquecido con una Taxonomía, Ontologías e Información del Usuario

Publicaciones similares