Getting Started with Keyword Extracting using Google Gemini
Keyword extraction is important for use cases where exact word match is required, such as tagging, full-text search, and filtering. Traditional keyword extraction involves complex natural language processing (NLP) pipeline, but that all changes with Large Language Model. In this post, we will guide you on how to extract relevant keywords from text using Google Gemini API.
Setting Up Google Gemini API Key
Signup/Signin your Google Gemini account. Then, navigate to the API key page and click “Create API Key”. Once you API key is generated, please save this somewhere safe and do not share it with anyone.
Install Google Gemini python package
!pip install -q -U google-generativeai
Initiate Gemini API client
import google.generativeai as genai
GOOGLE_API_KEY="<YOUR_GEMINI_API_KEY>"
genai.configure(api_key=GOOGLE_API_KEY)
Extract Relevant Keywords
A prompt is an instruction to an LLM to guide LLM on how it should process the input and format response. Prompt engineering is the art of creating the right prompt to get the best output from an LLM.
Let’s say we want to create a job recommender system. We can use Google Gemini to extract category and skill keywords for each job posting and for a user resume.
Prompt Templates:
prompt_template_job = """
Extract industries, job functions and skill keywords from the following job description. Please provide response as a Python dictionary with the following format:
{
"industry": list of industries,
"function": list of job functions,
"skill": list of skills
}
Job description:
<job_description>
"""
prompt_template_resume = """
Extract industries, roles and skill keywords from the following resume. Please provide response as a Python dictionary with the following format:
{
"industry": list of industries,
"function": list of job functions,
"skill": list of skills
}
Resume:
<resume>
"""
Function to extract relevant keywords from a job description or a resume:
import json
def get_keyword(text, model="gemini-1.5-flash", task="job"):
if task == "job":
prompt_template = prompt_template_job
text_to_replace = "<job_description>"
elif task == "resume":
prompt_template = prompt_template_resume
text_to_replace = "<resume>"
else:
raise ValueError("Unsupported task type: ", task)
text = text.replace("\n", " ")
prompt = prompt_template.replace(text_to_replace, text)
model = genai.GenerativeModel(model)
result = response = model.generate_content(prompt, generation_config={"response_mime_type": "application/json"})
if result is not None:
try:
return json.loads(result.text)
except:
print("Error processing JSON response:", result.text)
return None
else:
print("Error extracting keywords")
return None
def flatten_keywords(keywords_json):
flattened_keywords = []
for key in ["industry", "function", "skill"]:
if key in keywords_json:
flattened_keywords += keywords_json[key]
return list(set(flattened_keywords))
Extract Relevant Keywords
Then, we can calculate keyword matching, such as Jaccard similarity score, between job postings’ list of skills and user resume’s list of skills. The job that is most suitable to the user should have the highest matching score.
def calculate_jaccard_similarity(list1, list2):
set1 = set(list1)
set2 = set(list2)
# intersection of two sets
intersection = len(set1.intersection(set2))
# Unions of two sets
union = len(set1.union(set2))
return intersection / union
You can see a complete example in this Google Colab Notebook.