Category Archives: Development

Open Data Day 2016: João Pessoa, Brazil


In 2010, the very first edition of the Open Data Day was announced, and João Pessoa was one of the few cities in Brazil to participate. Although we did not have many participants back then, we had good media coverage and the event made our developers community a little bit stronger.

This year, motivated by some friends, I decided to try to organize another edition of the event.


My plan was to have again a hackathon, like in 2010 and as I did not have much time to spend on the organization itself, I decided to do something modest.

After creating a shared document and a Facebook event, I sent messages to mailing lists and posted on a few Facebook groups, and soon I had a people registering and proposing ideas on what to do on the event day.

One of the images used to promote the event online.

One of the images used to promote the event online

A research lab from my university (UFPB) contacted me offering help, and we decided to use their lab facilities to host the event. The lab is called Labtransp (Laboratório de Transparência Pública – Public Transparency Lab) and they provided some help in the organization. A few days later we discovered that there was an energy shutdown planned for the entire campus on the event day, so we had to move the event to the other university campus, which is located just 6 km away from the original place. We thanks the Centro de Informática (Informatics Faculty) for offering one of their computer labs for the event.

I also sent several emails to TV news and newspapers, but this year none of them showed interest, except the official university news agency, which posted a nice announcement on the official university website.

A few days before the event, I received an email from Open Knowledge and ILDA (The Latin American Open Data Initiative) stating that my proposal for their mini-grant was accepted. Good news, as we were able to provide free snacks for all participants with this money.

The event

There were 26 names filled the registration form, but only 13 people appeared. From my previous experience, this 50% no-show rate on free events is already something to expect. Most of the participants were computer science bachelor students (including colleagues of mine), but we also had a professor from Labtransp and an employee from an IT agency from our state (CODATA) attending the event.

My short presentation about the Open Data Day and Open Data

My short presentation about the Open Data Day and Open Data

I started the event with a brief presentation about what the Open Data Day is, what to expect and the schedule for the following hours. Afterwards, we discussed and shared interesting public data sources and formed teams to work together. There was plenty of food and drinks for the participants, who were able to focus on exploring and working with open data.

Thanks to Open Knowledge and ILDA, we were able to provide snacks for the participants

Thanks to Open Knowledge and ILDA, we were able to provide snacks for the participants

One of the teams focused on data from Bolsa Família, one of the biggest social welfare programs from Brazil to reduce poverty. Their goal was to cross regional and temporal data from this program with social and economic indicators. Tools such as Pentaho and CartoDB were used.

Another team explored datasets related to consumer complaints, in order to try to find which companies were getting most of the complaints and which regions were most dissatisfied with them. This team used mainly R to explore and process the data and Google Charts to try to create visualizations for it.

Teams working on their projects

Teams working on their projects

One individual decided to analyze party affiliations and public payroll data. His goal was to see if affiliating to specific parties would imply in getting fired or getting a public job.

Lastly, my team explored data from João Pessoa’s City Council. We tried to categorize the bills proposed by each city councilor, as we noticed that many of them were petitions for things such as modifying the name of streets, fixing holes on the street and granting badges and medals to citizens. We developed a scraper (hosted on GitHub) to collect the data from the official website and made a brief website to show this data.

Website developed with the data scraped by our team

Website developed with the data scraped by my team


Our main goal was to increase the awareness around open data and to build a local community of developers and enthusiasts. Although we had only 13 participants, this was the first contact of almost all of them with Open Data.

Unfortunately we planned the event from 8am to 1pm (including the initial presentation and time to form teams), but this was clearly not enough to come up with a final product. None of the teams could deliver or show something ready at the end. For the next edition, we plan to allocate more time and start later (9am or 10am). My team continued the development during the following days in order to fully implement our idea.

The event venue (a computer lab in our university) was fine, with enough desks and chairs, datashow and good Internet connection, but getting there by public transport was not that easy. As far as I could notice, all participants arrived by private transport. For the next event, we plan to choose a more central and accessible venue.

Our venue: classroom at a public university where I study

Our venue: classroom at a public university where I study

Finally, we should definitely start the organization a few weeks earlier, so we have more time to get in touch with TV news and newspapers.

Overall, we consider this edition of the event an small but important step in building a stronger open data enthusiasts local community, and we are looking forward the Open Data Day 2017. We would also like to ask other groups and individuals to give it a try in organizing their own Open Data Day edition in the years to come. It takes just a few hours and a couple of email messages to organizing a small edition.

Once again, we thank Labtransp, for helping in the organization, and the Open Knowledge and ILDA for the mini-grant. More pictures can be found in our Dropbox folder.

Open Data Day

Open Data Day João Pessoa 2016

Posted in Development | Leave a comment

Data analysis on Mitacs Globalink 2015 projects: Part 2 – Word Cloud


Mitacs Globalink Research Internships is a project from Mitacs which allows undergraduate students from Brazil, France, China, India, Mexico, Saudi Arabia, Turkey or Vietnam to perform a 3 months internship in some university lab research in Canada.

This series of post is a personal attempt to perform some basic data analysis over projects information, such as projects title and description. Check Part 1 to see my saga on collecting the data.


First question I wanted to answer was: “What are most of the projects about?”. I decided that generating an word cloud over the projects title would be an easy and quick way to get an overview on the keywords and topics used to describe the projects.

Continue reading

Posted in Development | Tagged | Leave a comment

Data analysis on Mitacs Globalink 2015 projects: Part 1 – The Data


Mitacs Globalink Research Internships is a project from Mitacs which allows undergraduate students from Brazil, France, China, India, Mexico, Saudi Arabia, Turkey or Vietnam to perform a 3 months internship in some university lab research in Canada.

I am interested in taking part of the program, and one of the application process steps is to choose between 3 and 7 projects from their 1.782 projects list (as I write this article). Using their website, you can filter those projects by university, province, language and by keywords.


I started performing queries with keywords such as “web” and other areas I am familiar with, but soon I realized that there were many other cool projects I could also apply to, so I ended up manually looking into all 1.700+ projects title and writing in a text file the ones I should spend more time reading the prerequisites and description.

Globalink projects list

Mitacs Globalink 2015 projects list.

When I was done, I got really curious about the data. “Which province is offering more projects?”, “What is the average amount of projects being offered per professor?“, “What would a word cloud with projects titles look like?

Continue reading

Posted in Development | Tagged | 4 Comments

Analysing a Facebook friendship network

I am taking a course on social networks this semester. As our first assignment, the teacher asked us to analyze our Facebook friendship network so we would get used to working with some tools.

If you never heard about social networks as a research topic and have no idea on what “analyze a friendship network” could possibly mean, take a look on this picture:

Random picture

Random Facebook friendship network I found on Google. Source: GriffsGraphs.

Here is a small tutorial on how I did my assignment.

Continue reading

Posted in Development | Leave a comment

Building your own Lucene Scorer

This post is about Apache Lucene, which is a “high-performance, full-featured text search engine library written entirely in Java”. If you have no idea on what I am talking about, this tutorial is not for you :). Be advised that this is my first month using Lucene, so there is still a chance that everything I say here is just plain wrong :P. Also, I am currently using version 3.6.1.

Doing an assignment from my Information Retrieval class I was faced with the problem of creating my own Scorer class on Lucene. When you create a new IndexSearcher, by default Lucene uses DefaultSimilarity, which is actually cosine similarity (in a Vector Space Model) with different weights such as boosts given when indexing, boosts given in the query, tf*idf and document length norm. A description on how it works exactly can be found on Similarity class documentation and on Lucene Score documentation.

Continue reading

Posted in Development | Tagged | 2 Comments