Prompt engineering is the art and science of crafting inputs that produce desired outputs from language models. The same model can generate brilliant insights or nonsense depending on how you ask. Prompt engineering is not tricking the model or finding magic words. It is communicating intent clearly, providing appropriate context, and structuring output expectations.
I have spent thousands of hours refining prompts for classification, extraction, generation, and reasoning tasks. I have learned that small changes in phrasing produce dramatically different results. I have seen chain-of-thought prompting transform unreliable reasoning into consistent logic. I have struggled with output format compliance and developed patterns that enforce structure. If you are looking for ready-to-use prompts for business operations, see our collection of prompt engineering templates for small business. This guide covers the patterns that work: fundamental prompting approaches from zero-shot to few-shot, chain-of-thought reasoning for complex problems, structured output enforcement, role prompting for specialized behavior, and systematic optimization strategies.
The Fundamentals
The Anatomy of a Prompt
System prompt: Sets behavior, constraints, and persona.
User message: The specific task or question.
Context: Background information needed to complete the task.
Instructions: How to approach the task.
Output format: Expected structure for the response.
[SYSTEM]
You are a senior software engineer reviewing code for security vulnerabilities.
Be thorough but concise. Focus on SQL injection, XSS, and authentication issues.
[USER]
Review the following Python function for security issues:
def get_user(user_id):
query = f"SELECT * FROM users WHERE id = {user_id}"
return db.execute(query)
Provide your findings as:
1. Vulnerability name
2. Severity (Critical/High/Medium/Low)
3. Explanation
4. Recommended fix
Zero-Shot Prompting
The model completes the task with no examples.
Classify the sentiment of this product review:
"This phone has amazing battery life and the camera is incredible.
Best purchase I've made this year."
Sentiment:
When to use:
- Simple, well-defined tasks
- Model has strong training on the task
- Speed is important
Limitations:
- May misunderstand format expectations
- Less consistent than few-shot
- Struggles with novel complex tasks
Few-Shot Prompting
Provide examples of desired input-output pairs.
Classify the sentiment of these product reviews:
Review: "Terrible quality, broke after one day. Waste of money."
Sentiment: Negative
Review: "Decent product, not great but worth the price."
Sentiment: Neutral
Review: "Absolutely love it! Exceeded all expectations."
Sentiment: Positive
Review: "Shipping was fast but the item doesn't match description."
Sentiment:
Guidelines:
- Include 2-5 examples for most tasks
- Show variety in inputs
- Format examples exactly as desired output
- Place examples immediately before the target task
Example selection:
- Choose diverse, representative examples
- Include edge cases
- Ensure examples are correct (models amplify errors)

Chain-of-Thought Prompting
Basic Pattern
Encourage step-by-step reasoning.

Solve this math problem step by step:
A store has 50 apples. They sell 15 in the morning and get a delivery
of 30 more in the afternoon. How many apples do they have at the end
of the day?
Let's solve this step by step:
1. Start with initial apples: 50
2. Subtract morning sales: 50 - 15 = 35
3. Add afternoon delivery: 35 + 30 = 65
4. Final count: 65 apples
Answer: 65 apples
Self-Consistency
Generate multiple reasoning paths and take the most common answer.
answers = []
for _ in range(5):
response = model.generate(
prompt,
temperature=0.7 # Add randomness
)
answers.append(parse_answer(response))
# Take majority vote
final_answer = most_common(answers)
Tree of Thoughts
Trace multiple reasoning branches.
Problem: Find the shortest path from A to D in this graph.
Approach 1: Via B
- A to B: 5 units
- B to D: 8 units
- Total: 13 units
Approach 2: Via C
- A to C: 3 units
- C to D: 7 units
- Total: 10 units
Comparison: Approach 2 (10 units) is shorter than Approach 1 (13 units).
Best path: A → C → D (10 units)
Structured Output Patterns
JSON Mode
Enforce JSON output (supported by GPT-4, Claude 3).

response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": """Extract the following information from this text:
"John Smith is a software engineer at Google with 5 years of experience.
He lives in San Francisco and specializes in machine learning."
Return as JSON with fields:
- name: string
- role: string
- company: string
- years_experience: number
- location: string
- specializations: array of strings"""
}],
response_format={"type": "json_object"} # Force JSON
)
# Parse result
import json
data = json.loads(response.choices[0].message.content)
Schema Enforcement
Provide explicit type information.
Extract information from the invoice and return valid JSON:
Required format:
{
"invoice_number": string, // Format: INV-YYYY-NNNN
"date": string, // ISO 8601 date
"total": number, // Decimal, 2 places
"items": [
{
"description": string,
"quantity": integer,
"unit_price": number
}
],
"vendor": {
"name": string,
"tax_id": string // Optional
}
}
Invoice text:
[invoice content here]
Function Calling / Tool Use
Define output as function parameters.
tools = [
{
"type": "function",
"function": {
"name": "extract_meeting_info",
"description": "Extract meeting details from text",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string"},
"date": {"type": "string", "format": "date"},
"time": {"type": "string", "format": "time"},
"attendees": {
"type": "array",
"items": {"type": "string"}
},
"agenda_items": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["title", "date", "time"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": "Meeting: Project kickoff tomorrow at 2pm with Alice, Bob, and Carol. "
"We'll discuss timeline and resource allocation."
}],
tools=tools,
tool_choice={"type": "function", "function": {"name": "extract_meeting_info"}}
)
# Extract structured data
args = json.loads(
response.choices[0].message.tool_calls[0].function.arguments
)
Role Prompting
Persona Assignment
Set expertise and perspective.

You are a senior DevOps engineer with 10 years of experience managing
AWS infrastructure. You are reviewing a Terraform configuration for
security and best practices. Be thorough and mention specific AWS
services and features.
Multi-Persona Debate
Have the model argue from multiple perspectives.
Analyze this architecture decision from three perspectives:
1. As a security engineer: Focus on data protection and access control
2. As a performance engineer: Focus on scalability and latency
3. As a cost engineer: Focus on resource efficiency and pricing
For each perspective, provide:
- Main concerns
- Potential issues
- Recommendations
Architecture: [description here]
Advanced Techniques
ReAct (Reasoning + Acting)
Combine reasoning with tool use.
You can use these tools:
- search(query): Search documentation
- calculate(expression): Perform calculations
- get_weather(location): Get weather data
Question: What is the population of Tokyo divided by the population of
Osaka? Consider only the city proper, not metropolitan area.
Thought: I need to find the populations of both cities.
Action: search("population of Tokyo city proper 2026")
Observation: Tokyo (city proper) has a population of 14.09 million
Thought: Now I need Osaka's population.
Action: search("population of Osaka city proper 2026")
Observation: Osaka (city proper) has a population of 2.75 million
Thought: Now I can calculate the ratio.
Action: calculate("14.09 / 2.75")
Observation: 5.123636...
Answer: The population of Tokyo is approximately 5.12 times that of Osaka.
Prompt Chaining
Break complex tasks into sequential prompts.
def analyze_document(document):
# Step 1: Summarize
summary = llm.generate(f"Summarize this document in 3 sentences:\n{document}")
# Step 2: Extract key points
key_points = llm.generate(
f"Based on this summary, extract 5 key points:\n{summary}",
output_format="bullet_list"
)
# Step 3: Generate action items
actions = llm.generate(
f"Based on these key points, suggest action items:\n{key_points}"
)
return {
'summary': summary,
'key_points': key_points,
'actions': actions
}
Retrieval-Augmented Generation
Ground prompts in external knowledge. For a detailed guide on building the infrastructure for semantic search and document ingestion, read our article on building a RAG system for your business.
def answer_with_context(question, knowledge_base):
# Retrieve relevant documents
relevant_docs = knowledge_base.search(question, k=3)
context = "\n\n".join([doc.text for doc in relevant_docs])
prompt = f"""Answer the question based on the provided context.
Context:
{context}
Question: {question}
If the context does not contain the answer, say "I don't have
sufficient information to answer this question."
Answer:"""
return llm.generate(prompt)
Prompt Optimization
A/B Testing
Compare prompt variations systematically. When running large-scale tests, you should monitor token costs and API latency. For practical strategies on reducing model fees during development and production, see our guide to optimizing LLM token costs.

prompts = [
"Classify the sentiment:",
"Is this review positive, negative, or neutral?",
"Rate the sentiment from -1 (very negative) to 1 (very positive):"
]
results = {}
for prompt in prompts:
correct = 0
for example in test_set:
response = llm.generate(f"{prompt}\n\n{example.text}")
if parse_sentiment(response) == example.label:
correct += 1
accuracy = correct / len(test_set)
results[prompt] = accuracy
best_prompt = max(results, key=results.get)
Prompt Versioning
Track prompt changes and performance.
# prompts.yaml
classification_v1:
template: "Classify: {text}"
accuracy: 0.78
classification_v2:
template: |
Classify the following text as positive, negative, or neutral.
Text: {text}
Classification:
accuracy: 0.85
improved: true
Temperature and Sampling
Temperature:
- 0.0: Deterministic, best for structured output
- 0.7: Balanced creativity
- 1.0: Maximum creativity
Top-p: Alternative to temperature, consider tokens until cumulative probability exceeds threshold.
# Consistent classification
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.0 # Deterministic
)
# Creative generation
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.9 # Creative
)
Domain-Specific Patterns
Code Generation
Write a Python function that [description].
Requirements:
- Include type hints
- Add docstring with examples
- Handle edge cases
- Follow PEP 8 style
- Include unit tests as comments
Function signature: def function_name(param: type) -> return_type:
Data Extraction
Extract structured data from this unstructured text.
Text: [unstructured text]
Extract:
1. Person names
2. Organization names
3. Dates
4. Monetary amounts
5. Email addresses
Return as JSON array with fields: type, value, context (surrounding text)
Summarization
Summarize the following article for a technical audience.
Constraints:
- Maximum 200 words
- Include 3 key takeaways as bullet points
- Maintain original tone
- Preserve technical accuracy
Article:
[article text]
Common Pitfalls
Pitfall 1: Vague Instructions
“Analyze this text” → “Extract named entities and their relationships”
Pitfall 2: No Output Format
No format specified → Inconsistent structure
Pitfall 3: Leading Questions
“Don’t you think X is bad?” → Biased responses
Pitfall 4: Too Much Context
Unrelated information dilutes focus
Pitfall 5: Assuming Knowledge
Acronyms without definitions
Pitfall 6: Not Testing Edge Cases
Works on common cases, fails on unusual inputs
Pitfall 7: Ignoring Security Vulnerabilities
Passing untrusted user input directly into prompt templates exposes your application to prompt injection attacks, allowing users to override system constraints. Check our guide on preventing prompt injection attacks to learn how to secure your pipelines.
Conclusion
Prompt engineering is a skill developed through practice and measurement. Start with clear, specific instructions. Provide examples for complex tasks. Use chain-of-thought for reasoning. Enforce structure through formatting instructions or function calling.
Test prompts systematically. A/B test variations. Version your prompts. Measure accuracy against labeled data.
The goal is not clever tricks but clear communication of intent to the model. The model wants to help; your job is to explain what you need.
Invest in prompt engineering. Good prompts make the difference between unreliable demos and production-ready AI features.
Further Reading
- OpenAI Prompt Engineering Guide: Official best practices
- Anthropic Claude Documentation: Claude-specific guidance
- “Chain-of-Thought Prompting Elicits Reasoning in LLMs”: Research paper
- Prompt Engineering Guide (promptingguide.ai): In-depth techniques
- LangChain Documentation: Prompt templates and chaining