O Workshop de eScience é destinado a pesquisadores com interesses de pesquisa relacionados a eScience (ciência orientada a dados), abrangendo um escopo bastante amplo da inteligência artificial, incluindo aprendizado de máquina, reconhecimento de padrões, análise/mineração/integração de dados (sistemas de recomendação, big data), inferência e análise de redes complexas, visão computacional, com aplicações interdisciplinares nos mais diversos domínios.
O Workshop de eScience ocorreu em 22 de junho de 2017, organizado por Bernardo Nunes Gonçalves (IBM Research), Carlos da Silva dos Santos (UFABC), David Correa Martins Jr (UFABC), e contou com as seguintes palestras:
- 9:05 – Gene interaction networks inference and search for complex disease biomarkers by complex networks analysis and data integration (Slides)
- Palestrante: David Correa Martins Jr (CMCC-UFABC)
- Resumo:Systems Biology is an interdisciplinary field which aims to study the functioning of living beings as a complex interaction network with several biological entities. This field presents several important open problems, such as gene regulatory networks (GRN) inference and prioritization of genes (or biomarkers) potentially associated with complex diseases. Regarding the GRN inference problem, this presentation will discuss the basic concepts and present some results and ongoing researches involving feature selection and prior knowledge about the global topological structure of GRNs (a mix of scale-free and small-world topologies). Genes prioritization problem is another important topic, since the main challenge regarding the study of complex diseases is due to the fact that they are characterized as being polygenic and multifactorial, where the environment also plays an important role. This presentation will briefly discuss a methodology that integrates PPI networks with disease specific data sources, such as genome wide association studies (GWAS) and gene expression data from control and disease conditions to find genes more specific of a given complex disease. Results and ongoing works in this topic will be presented.
- 9:50 – Symbolic Regression or: How I Learned to Worry About my Machine Learning Models (Slides)
- Palestrante: Fabrício Olivetti de França (CMCC-UFABC)
- Resumo: Many Machine Learning algorithms make either simplistic assumptions (Linear Regression, k-NN) or obscure ones (Neural Networks) in order to find a model that fits a given data sample. The problem with the former is the limitation to deal with nonlinear relationships, while the latter limits the interpretability of the resulting model. The lack of interpretability raises some trust concerns toward the model and limits its usefulness for the study being performed. Symbolic Regression, on the other hand, has the objective of finding a nonlinear model that fits the sampled data while maximizing the interpretability. This talk will present the current state of Symbolic Regression, its current limitations as well as the future perspectives.
- 10:55 – Weak supervised and unconventional classification problems (Slides)
- Palestrante: Ronaldo Cristiano Prati (CMCC-UFABC)
- Resumo: Traditional classification learning algorithms are based on the assumption that a single target attributed is associated to the classification task, and that the class values available training instances are correctly determined in the available data set. Furthermore, the drawn instances for this training set is independent and identically distributed. However, due to the widespread use of machine learning algorithms in real world tasks, these assumptions are often violated. In this talk, I will present some ongoing research for weak supervised (the class associated to each instance cannot be assumed to be true or the sample may be biased) and unconventional classification problems.
- 11:40 – A data-driven workflow for predicting the production of oil and gas at horizontal wells using vertical well logs (Slides)
- Palestrante: Jorge Guevara Diaz (IBM Research)
- Resumo: In recent work, data-driven sweet spotting technique for shale plays previously explored with vertical wells has been proposed. Here, we extend this technique to multiple formations and formalize a general data-driven workflow to facilitate feature extraction from vertical well logs and predictive modeling of horizontal well production. We also develop an experimental framework that facilitates model selection and validation in a realistic drilling scenario. We present some experimental results using this methodology in a field with 90 vertical wells and 98 horizontal wells, showing that it can achieve better results in terms of predictive ability than kriging of known production values.
- 13:55 – Hypothesis management in support of the e-scientific method (Slides)
- Palestrante: Bernardo Nunes Gonçalves (IBM Research)
- Resumo: Ten years have passed since Jim Gray and others first acknowledged the advent of a `transformed scientific method.’ Nonetheless, while research on eScience is still majorly associated with hardware and software infrastructure for data-driven science, including tracking the provenance of how scientists run their computational experiments, the epistemological aspects of such e-scientific method are still barely explored. In this talk I will focus on a specific problem in that landscape, which is data management support for experiments whose purpose is to assess rival hypotheses, the so-called `crucial experiments.’ Along those lines, I will show results on the automatic synthesis of a (U-relational) probabilistic database out of rival sets of mathematical equations and their related datasets stored in files.
- 14:40 – Knowledge Transfer in Reinforcement Learning (Slides)
- Palestrante: Anna Helena Reali Costa (Escola Politécnica – USP)
- Resumo: Humans are very good at learning from their own experience and using knowledge gained from solving past problems to efficiently solve new similar problems. How can we build artificial agents with such capabilities? In this talk I focus on Reinforcement Learning (RL), a method that has successfully been applied to build autonomous agents that solve challenging sequential decision-making problems by interacting with the environment. In RL, an agent explores the space of possible strategies to solve a task in a given environment, receives a feedback (reward) on the outcome of the actions it takes and deduces a behavior policy from its observations over time. However, agents need a long time to learn a task in this setting. To speed up this procedure, I outline approaches that allow agents to leverage experience gained from solving previous RL tasks. Some applications will also be presented.
- 15:25 – Images, Pattern Recognition and Machine Learning at IME/USP (Slides)
- Palestrante: Nina S. T. Hirata (IME-USP)
- Resumo: Image/data analysis and computer vision is central in several of the research projects done at the Department of Computer Science of IME/USP. Pattern recognition and machine learning techniques together with appropriate models are used to tackle image/data analysis and computer vision problems in several domains of application. In this talk I will highlight some ongoing research projects and related research challenges. These projects deal with different types of images/data such as plankton images, optical astronomical images, document images, handwriting, or medical images.
- 16:30 – Pattern Recognition and Image Analysis Research @WWU (Slides)
- Palestrante: Xiaoyi Jiang (University of Münster – Germany)
- Resumo: In this talk I will present an overview of the research activities in my group. Particular focus will be given to biomedical imaging and consensus learning.
- 17:15 – São Paulo Research Foundation (FAPESP) and International Collaboration
- Palestrante: Roberto Marcondes Cesar-Jr (IME-USP)
- Resumo: The São Paulo Research Foundation (FAPESP) has been stablished in 1962 in order to support research activities in the state of Sao Paulo. FAPESP and the research in Sao Paulo will be discussed in this talk. Special focus on the funding opportunities for international collaboration will be presented, including mobility, grants and scholarships.