API Reference¶
This reference documents the public API for ProofOfThought.
ProofOfThought¶
The main entry point for the reasoning system.
Location: z3adapter.reasoning.proof_of_thought.ProofOfThought
Constructor¶
def __init__(
self,
llm_client: Any,
model: str = "gpt-5",
backend: Literal["json", "smt2"] = "smt2",
max_attempts: int = 3,
verify_timeout: int = 10000,
optimize_timeout: int = 100000,
cache_dir: str | None = None,
z3_path: str = "z3",
) -> None
Parameters:
llm_client
: OpenAI/AzureOpenAI client instancemodel
: Deployment/model name (default:"gpt-5"
)backend
:"json"
or"smt2"
(default:"smt2"
)max_attempts
: Retry limit for generation (default:3
)verify_timeout
: Z3 timeout in milliseconds (default:10000
)optimize_timeout
: Optimization timeout in ms, JSON only (default:100000
)cache_dir
: Program cache directory (default:tempfile.gettempdir()
)z3_path
: Z3 executable path for SMT2 (default:"z3"
)
query()¶
def query(
self,
question: str,
temperature: float = 0.1,
max_tokens: int = 16384,
save_program: bool = False,
program_path: str | None = None,
) -> QueryResult
Parameters:
question
: Natural language questiontemperature
: LLM temperature (default:0.1
, ignored for GPT-5 which only supports1.0
)max_tokens
: Max completion tokens (default:16384
)save_program
: Save generated program to disk (default:False
)program_path
: Custom save path (default: auto-generated incache_dir
)
Returns: QueryResult
Implementation details:
The method implements a retry loop with error feedback:
for attempt in range(1, max_attempts + 1):
if attempt == 1:
gen_result = self.generator.generate(question, temperature, max_tokens)
else:
gen_result = self.generator.generate_with_feedback(
question, error_trace, previous_response, temperature, max_tokens
)
# ... execute and check result
QueryResult¶
Contains the results of a reasoning query.
@dataclass
class QueryResult:
question: str # Input question
answer: bool | None # True (SAT), False (UNSAT), None (ambiguous/error)
json_program: dict[str, Any] | None # Generated program if JSON backend
sat_count: int # SAT occurrences in output
unsat_count: int # UNSAT occurrences
output: str # Raw Z3 output
success: bool # Execution completed
num_attempts: int # Generation attempts used
error: str | None # Error message if failed
EvaluationPipeline¶
Facilitates batch evaluation of reasoning questions on datasets.
Location: z3adapter.reasoning.evaluation.EvaluationPipeline
Constructor¶
def __init__(
self,
proof_of_thought: ProofOfThought,
output_dir: str = "evaluation_results",
num_workers: int = 1,
) -> None
Parameters:
proof_of_thought
: Configured ProofOfThought instanceoutput_dir
: Results directory (default:"evaluation_results"
)num_workers
: Parallel workers (default:1
, usesThreadPoolExecutor
if> 1
)
evaluate()¶
def evaluate(
self,
dataset: list[dict[str, Any]] | str,
question_field: str = "question",
answer_field: str = "answer",
id_field: str | None = None,
max_samples: int | None = None,
skip_existing: bool = True,
) -> EvaluationResult
Parameters:
dataset
: JSON file path or list of dictsquestion_field
: Field name for question text (default:"question"
)answer_field
: Field name for ground truth (default:"answer"
)id_field
: Field for sample ID (default:None
, auto-generatessample_{idx}
)max_samples
: Limit samples (default:None
, all)skip_existing
: Skip cached results (default:True
)
Returns: EvaluationResult
Caching behavior:
Results are cached by saving {sample_id}_result.json
and {sample_id}_program{ext}
files to output_dir
.
EvaluationMetrics¶
Provides comprehensive metrics for evaluation results.
@dataclass
class EvaluationMetrics:
accuracy: float # sklearn.metrics.accuracy_score
precision: float # sklearn.metrics.precision_score (zero_division=0)
recall: float # sklearn.metrics.recall_score (zero_division=0)
f1_score: float # 2 * (P * R) / (P + R)
specificity: float # TN / (TN + FP)
false_positive_rate: float # FP / (FP + TN)
false_negative_rate: float # FN / (FN + TP)
tp: int # True positives
fp: int # False positives
tn: int # True negatives
fn: int # False negatives
total_samples: int # Correct + wrong + failed
correct_answers: int # answer == ground_truth
wrong_answers: int # answer != ground_truth
failed_answers: int # success == False
Metrics are computed using sklearn.metrics.confusion_matrix
for binary classification.
Backend¶
Defines the abstract interface for execution backends.
Location: z3adapter.backends.abstract.Backend
Interface Methods¶
class Backend(ABC):
@abstractmethod
def execute(self, program_path: str) -> VerificationResult:
pass
@abstractmethod
def get_file_extension(self) -> str:
pass
@abstractmethod
def get_prompt_template(self) -> str:
pass
def determine_answer(self, sat_count: int, unsat_count: int) -> bool | None:
if sat_count > 0 and unsat_count == 0:
return True
elif unsat_count > 0 and sat_count == 0:
return False
else:
return None
Concrete implementations are provided by SMT2Backend
and JSONBackend
.
VerificationResult¶
Encapsulates the results of Z3 verification execution.
@dataclass
class VerificationResult:
answer: bool | None # True (SAT), False (UNSAT), None (ambiguous/error)
sat_count: int
unsat_count: int
output: str # Raw execution output
success: bool # Execution completed without exception
error: str | None # Error message if failed
Z3ProgramGenerator¶
Handles LLM-based program generation with error recovery.
Location: z3adapter.reasoning.program_generator.Z3ProgramGenerator
generate()¶
def generate(
self,
question: str,
temperature: float = 0.1,
max_tokens: int = 16384,
) -> GenerationResult
LLM API Call:
response = self.llm_client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=max_tokens,
)
Note that the temperature
parameter is not passed to the API due to GPT-5 constraints.
generate_with_feedback()¶
Enables multi-turn conversation with error feedback:
messages=[
{"role": "user", "content": prompt},
{"role": "assistant", "content": previous_response},
{"role": "user", "content": feedback_message},
]
Utility: Azure Config¶
Provides convenient configuration for Azure OpenAI deployments.
Location: utils.azure_config.get_client_config()
Returns:
{
"llm_client": AzureOpenAI(...),
"model": str # Deployment name from env
}
Required environment variables:
AZURE_OPENAI_API_KEY
AZURE_OPENAI_ENDPOINT
AZURE_OPENAI_API_VERSION
AZURE_GPT5_DEPLOYMENT_NAME
orAZURE_GPT4O_DEPLOYMENT_NAME