Our MBA in Data Journalism students produced a series of tutorials as their final work in the Low Code discipline: Transforming data into guidelines without programming, taught by professor Adriano Belisário. This month you will be able to check out some of their work and have fun with the tutorials they created. Today you can check out the tutorial made by Mariana Hallal.
Vaccination against Covid-19 in Brazil began with groups most vulnerable to the disease or who are most exposed to the virus. Elderly people, people living in long-term care facilities, healthcare professionals and indigenous people were at the top of the list.
Recently, in May, people with comorbidities began to be covered. See how to find out the impact of this vaccination on the number of people hospitalized. To do this, we will explore the Sivep-Gripe data from the Ministry of Health in SQL and Google Sheets.
Step by step
1 - Download data from the Ministry of Health. For this analysis, we will only use data from 2021. Click here to download the spreadsheet . To download, you need to click on “SRAG” > “Explore” > “Download”
2 - If you have never used SQL, you need to download the program. Let's use DB Browser for SQLite. Download here .
3 - In SQL, click “New Database”, name this database “comorbidity” and save.
3 - A window will open asking you to create a table. Click cancel because we are going to import the data.
4 - Click “File” > “Import” > “Table from CSV file”
5 - Select the file we downloaded from the Ministry of Health, configure the table as shown in the image and click OK
7 - The table name contains elements that can impair the analysis, such as the hyphen (-). Let's rename the table to eliminate this problem. Right-click on the table name and click “Modify table”.
8 - Rename the table “srag2021” and click OK.
9 - Now let's move on to the analysis. Let's select the columns we will use. Click on “Run SQL”, type the command below and press the “play” button.
SELECT FACTOR_RISC, substr (DT_INTERNA, 4, 10) AS mes_interna
FROM srag2021
WHERE CLASSI_FIN = 5
The SELECT function tells the program which columns we want. The FROM function tells us which table we want to pull these columns from. The WHERE function makes a filter. In this case, we only want the lines where the final rating is 5 (covid).
The term “substr” at the end of the SELECT function is used to cut characters from an entire column. Let's apply it to the DT_INTERNA column to only have the month and year of hospitalization - this will make the analysis easier.
10 - This table is the one we will use. So, let's create a view to be able to work better with it. Click “Save results view” > “Save as view”. Name this view “factorrisk”
11 - Press two Enters after the last line and let's go to another sentence. Our ultimate goal is to find out how many % of hospitalized patients had a risk factor each month. So let's ask SQL to count how many people with and without risk factors were hospitalized each month.
SELECT *, count (RISC_FATOR) AS totalfactor
FROM factorderisco
WHERE mes_interna LIKE '%2021%'
GROUP BY internal_month, RISC_FATOR
With this sentence, we filter only the months of 2021. Although the spreadsheet, theoretically, only contains data from 2021, there are many typing errors in the “date” field that hinder the analysis.
After typing the sentence, select it and click “Execute”.
12 - Let's save this result and move it to Google Sheets. Click on the same save view button, choose “Export to CSV”, click on “save” and name it “risk factor”.
13 - Create a new spreadsheet in Google Sheets. Click “File” > “Import” > “Upload” > “Select a file from your device”. Uncheck the box that says “Convert text to numbers, dates and formulas” and click “Import data”.
14 - Click “View” > “Freeze” > “1 row” to mark the first row as the header.
15 - Select the “totalfactor” column, click the “123” button and select the “0” or “number” option.
16 - Let's create a pivot table. Click “Data” > “PivotTable” > “Create”. Click “Add” next to “Rows” in the left menu and select “mes_interna”.
Click “Add” next to “Values” and select “totalfactor”. In the “Summarize by” tab, select “SUM”.
Copy the result and paste only the values (CTRL + SHIFT + V) into cell F1 of the main tab.
17 - In cell D1 of the first tab, type “total_per_month”. In cell D2, type the following formula: =VLOOKUP(B:B;F:G;2;FALSE).
18 - Double-click the little blue dot in the corner of cell D2 to apply this information to the entire column.
19 - In cell E1, type “percentage”. In cell E2, type the following formula: =C2/D2. Click the blue square in the corner of the cell to apply the formula to the entire column. Select the entire column E, click on “123” and select percentage.
20 - Ready! This is the result. In the FACTOR_RISC column, the number 1 means “yes, the patient has a risk factor” and the number 2 means “the patient has no risk factor”. This information is in the data dictionary .
We were able to observe that in January 65.6% of deaths occurred among people with risk factors. In June, the percentage already fell to 52.37%. Always remember to talk to an expert before drawing conclusions about a database.
Vaccination of people with comorbidities began in May in most states. Therefore, so far, most of these people have only taken the first dose of the vaccine. In the coming weeks, people with risk factors should represent an even smaller group among those hospitalized. To filter by UF, you can put the desired UF in the WHERE clause in the first sentence in SQL.
Please Post Your Comments & Reviews