VLOOKUP from Excel using Python and Streamlit

few hours with…
6 min readFeb 7, 2023

While working on other projects, I accidentally saw the Streamlit framework and it blew my mind. It is an open-sourced framework that helps builds web apps in seconds. Before that, I was using Flask library which is also quite easy to create web apps, but compared to Streamlit it takes much more time. In Streamlit you don't need to think about design, templates, and other front-end stuff, it is already created for you, and for back-end developers, it looks more than nice. And the best part is, you can upload your project on GitHub and use Streamlit servers to push your project online. You will have not the prettiest web address, but if you work with personal projects or projects for small groups of people, I don't think it's a big problem.

In this tutorial, I will show you, how you can create a Streamlit project and push it on the web. As an example, I will use the VLOOKUP function from excel. Excel is a potent tool with lots and lots of options, but the majority of people are using 1%–2% of excel power. I wanted to check if we can extract just some functions which we really need for our work and create them in our own environment. I choose VLOOKUP as the first function because my work colleagues have asked me to find info between multiple data a lot of times, so I created a Streamlit web app so they can easily do it by themselves. It is a little bit upgraded version of VLOOKUP, but the main functionality is the same.

The plan is to create a Streamlit web app where we can upload excel files or templates and the web app will find all the necessary data and connect them together. In the end, we will be able to download a new excel file with the results.

I will use Pycharm for this project, but you can freely use other IDEs. Let's import the necessary libraries. We will need only three

import streamlit as st
import pandas as pd
from io import BytesIO

To understand how Streamlit works, let's just add simple text and open the result in the browser. Add the following code:

st.write("Look up values")

Next, open the terminal and navigate to the project, if you use Pycharm then you can open the Pycharms terminal and write the command:

streamlit run main.py

If you are in right place in the terminal, it will open automatically in the browser. It should look like this:

In the top right corner, you have some additional options. When you change something in the code, there will be an open possibility to rerun the Streamlit page or you can press R as a shortcut.

All page consists of some kind of blocks that are built in the Streamlit framework, like inputs, buttons, sliders, etc. You can find all of them by googling “streamlit widgets”. For this project, we will use three more widgets. First will be “file uploader” so we can upload files, “dataframe” to display results, and “download button” to download the results.

We will use a template excel file, which will be necessary to fill. You can find the excel template here. To easier understand what's going on you can download the filled excel file here and try it. In the first sheet “main_values” we need to add lists of values that need to be paired (look up values). For example, it looks like this:

Second sheet “look_ip_values” we have a list of values that we need to pair with the main values. I shuffled and randomly added/ deleted, so there is no doubt it really looks for values and gives back correct answers.

Now we have some example files so we can start to create code. Let's add to the code widget for uploading files and save the result in the variable. File uploader will allow csv and xlsx files

uploaded_file = st.file_uploader("Choose file", type=["csv", "xlsx"])

Now we can check if the file exists and if it exists let's save data from the first and second sheets in variables:

if uploaded_file is not None:
df_main_values = pd.read_excel(uploaded_file,
sheet_name="main_values")
df_lookup_values = pd.read_excel(uploaded_file,
sheet_name="look_up_values",
index_col=0,
dtype=str)

To see that everything works we can create temporary data frames to see both sheet data:

    st.dataframe(df_main_values)
st.dataframe(df_lookup_values)

It should look like this:

Delete the last two rows of code and let's start building logic. I created very simplified logic, create a dictionary, and just for loop through data and fill the dictionary with matching data. The code could be improved but for example, for purposes let's leave it as simple as possible. In the end, let's add a “dataframe” widget to display the result:

value_dictionary = df_lookup_values.to_dict('index')
main_values_list = df_main_values["Main values"].tolist()

new_df_dict = {
"Main values": [],
"value 1": [],
"value 2": [],
"value 3": [],
"value 4": [],
"value 5": []
}

new_df = pd.DataFrame(new_df_dict)

for main_value in main_values_list:
if main_value in value_dictionary.keys():
new_df.loc[len(new_df.index)] = [main_value,
value_dictionary[main_value]["value 1"],
value_dictionary[main_value]["value 2"],
value_dictionary[main_value]["value 3"],
value_dictionary[main_value]["value 4"],
value_dictionary[main_value]["value 5"]]
else:
new_df.loc[len(new_df.index)] = [main_value, "", "", "", "", ""]

new_df.fillna("", inplace=True)

st.dataframe(new_df)

Now it should look like this:

Two more things before uploading on the real web, we need to add a download button and we need a function that creates an excel file from our data. Let's start with the function:

def to_excel(data_frame):
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
data_frame.to_excel(writer, index=False, sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '0.00'})
worksheet.set_column('A:A', None, format1)
writer.save()
processed_data = output.getvalue()
return processed_data

And at the end add a download button widget:

    st.download_button(
label="Download result",
data=to_excel(new_df),
file_name='result.xlsx'
)

The final result should look like this:

Now we have finished the coding part which works locally on your computer. To add a project to the web we need to upload main.py to GitHub. If you don't have GitHub then create an account. Create a new project, upload “main.py” and create one more file which is called “requirements.txt” and also upload it to GitHub. In the file copy the following:

pandas
openpyxl
xlsxwriter

This will tell Streamlit what libraries are necessary. The final part is to go to the Streamlits web page and sign in using a GitHub account. Create “New app” and choose “From existing repo”:

Next, fill in the info about your GitHub project. In “Repository” add the project. When clicking on the input block it will start to browse through all your GitHub projects, choose the project which you uploaded on GitHub. If you didn't change anything then the branch should be main. And last is a file path, in this case, it's gonna be a file name for the file you uploaded on GitHub.

Next click “Deploy” and after that, it's going to take some time to publish your project. Just know, if you won't use the web page for 5 days web app will be in “sleeping mode”. It means that after 5 days of the non-using page, you will need to “wake” it, in reality, it means that the app will redeploy and will take some time to do that.

The final result in streamlit can be found here (if it's sleeping, just “wake” it up).

Project on the GitHub can be found here.

In the future, I'm planning to update with some additional popular excel functions, but for now thanks for reading!

--

--

few hours with…

Writing a blog about learning and exploring new stuff just for fun.