API Reference¶
This reference documents the public API for ProofOfThought.
ProofOfThought¶
The main entry point for the reasoning system.
Location: z3adapter.reasoning.proof_of_thought.ProofOfThought
Constructor¶
def __init__(
self,
llm_client: Any,
model: str = "gpt-5",
backend: Literal["json", "smt2"] = "smt2",
max_attempts: int = 3,
verify_timeout: int = 10000,
optimize_timeout: int = 100000,
cache_dir: str | None = None,
z3_path: str = "z3",
) -> None
Parameters:
llm_client: OpenAI/AzureOpenAI client instancemodel: Deployment/model name (default:"gpt-5")backend:"json"or"smt2"(default:"smt2")max_attempts: Retry limit for generation (default:3)verify_timeout: Z3 timeout in milliseconds (default:10000)optimize_timeout: Optimization timeout in ms, JSON only (default:100000)cache_dir: Program cache directory (default:tempfile.gettempdir())z3_path: Z3 executable path for SMT2 (default:"z3")
query()¶
def query(
self,
question: str,
temperature: float = 0.1,
max_tokens: int = 16384,
save_program: bool = False,
program_path: str | None = None,
) -> QueryResult
Parameters:
question: Natural language questiontemperature: LLM temperature (default:0.1, ignored for GPT-5 which only supports1.0)max_tokens: Max completion tokens (default:16384)save_program: Save generated program to disk (default:False)program_path: Custom save path (default: auto-generated incache_dir)
Returns: QueryResult
Implementation details:
The method implements a retry loop with error feedback:
for attempt in range(1, max_attempts + 1):
if attempt == 1:
gen_result = self.generator.generate(question, temperature, max_tokens)
else:
gen_result = self.generator.generate_with_feedback(
question, error_trace, previous_response, temperature, max_tokens
)
# ... execute and check result
QueryResult¶
Contains the results of a reasoning query.
@dataclass
class QueryResult:
question: str # Input question
answer: bool | None # True (SAT), False (UNSAT), None (ambiguous/error)
json_program: dict[str, Any] | None # Generated program if JSON backend
sat_count: int # SAT occurrences in output
unsat_count: int # UNSAT occurrences
output: str # Raw Z3 output
success: bool # Execution completed
num_attempts: int # Generation attempts used
error: str | None # Error message if failed
EvaluationPipeline¶
Facilitates batch evaluation of reasoning questions on datasets.
Location: z3adapter.reasoning.evaluation.EvaluationPipeline
Constructor¶
def __init__(
self,
proof_of_thought: ProofOfThought,
output_dir: str = "evaluation_results",
num_workers: int = 1,
) -> None
Parameters:
proof_of_thought: Configured ProofOfThought instanceoutput_dir: Results directory (default:"evaluation_results")num_workers: Parallel workers (default:1, usesThreadPoolExecutorif> 1)
evaluate()¶
def evaluate(
self,
dataset: list[dict[str, Any]] | str,
question_field: str = "question",
answer_field: str = "answer",
id_field: str | None = None,
max_samples: int | None = None,
skip_existing: bool = True,
) -> EvaluationResult
Parameters:
dataset: JSON file path or list of dictsquestion_field: Field name for question text (default:"question")answer_field: Field name for ground truth (default:"answer")id_field: Field for sample ID (default:None, auto-generatessample_{idx})max_samples: Limit samples (default:None, all)skip_existing: Skip cached results (default:True)
Returns: EvaluationResult
Caching behavior:
Results are cached by saving {sample_id}_result.json and {sample_id}_program{ext} files to output_dir.
EvaluationMetrics¶
Provides comprehensive metrics for evaluation results.
@dataclass
class EvaluationMetrics:
accuracy: float # sklearn.metrics.accuracy_score
precision: float # sklearn.metrics.precision_score (zero_division=0)
recall: float # sklearn.metrics.recall_score (zero_division=0)
f1_score: float # 2 * (P * R) / (P + R)
specificity: float # TN / (TN + FP)
false_positive_rate: float # FP / (FP + TN)
false_negative_rate: float # FN / (FN + TP)
tp: int # True positives
fp: int # False positives
tn: int # True negatives
fn: int # False negatives
total_samples: int # Correct + wrong + failed
correct_answers: int # answer == ground_truth
wrong_answers: int # answer != ground_truth
failed_answers: int # success == False
Metrics are computed using sklearn.metrics.confusion_matrix for binary classification.
Backend¶
Defines the abstract interface for execution backends.
Location: z3adapter.backends.abstract.Backend
Interface Methods¶
class Backend(ABC):
@abstractmethod
def execute(self, program_path: str) -> VerificationResult:
pass
@abstractmethod
def get_file_extension(self) -> str:
pass
@abstractmethod
def get_prompt_template(self) -> str:
pass
def determine_answer(self, sat_count: int, unsat_count: int) -> bool | None:
if sat_count > 0 and unsat_count == 0:
return True
elif unsat_count > 0 and sat_count == 0:
return False
else:
return None
Concrete implementations are provided by SMT2Backend and JSONBackend.
VerificationResult¶
Encapsulates the results of Z3 verification execution.
@dataclass
class VerificationResult:
answer: bool | None # True (SAT), False (UNSAT), None (ambiguous/error)
sat_count: int
unsat_count: int
output: str # Raw execution output
success: bool # Execution completed without exception
error: str | None # Error message if failed
Z3ProgramGenerator¶
Handles LLM-based program generation with error recovery.
Location: z3adapter.reasoning.program_generator.Z3ProgramGenerator
generate()¶
def generate(
self,
question: str,
temperature: float = 0.1,
max_tokens: int = 16384,
) -> GenerationResult
LLM API Call:
response = self.llm_client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=max_tokens,
)
Note that the temperature parameter is not passed to the API due to GPT-5 constraints.
generate_with_feedback()¶
Enables multi-turn conversation with error feedback:
messages=[
{"role": "user", "content": prompt},
{"role": "assistant", "content": previous_response},
{"role": "user", "content": feedback_message},
]
Utility: Azure Config¶
Provides convenient configuration for Azure OpenAI deployments.
Location: utils.azure_config.get_client_config()
Returns:
{
"llm_client": AzureOpenAI(...),
"model": str # Deployment name from env
}
Required environment variables:
AZURE_OPENAI_API_KEYAZURE_OPENAI_ENDPOINTAZURE_OPENAI_API_VERSIONAZURE_GPT5_DEPLOYMENT_NAMEorAZURE_GPT4O_DEPLOYMENT_NAME