.. _examples:

================
示例代码
================

本节提供 PyCorrAna 的各种使用示例。

基础示例
========

快速分析 CSV 文件
-----------------

.. code-block:: python

   from pycorrana import quick_corr

   result = quick_corr('sales_data.csv')
   
   print(result['significant_pairs'][:5])

分析 Excel 数据
---------------

.. code-block:: python

   from pycorrana import quick_corr

   result = quick_corr(
       'data.xlsx',
       target='revenue',
       export='correlation_results.xlsx'
   )

指定分析方法
------------

.. code-block:: python

   from pycorrana import CorrAnalyzer

   analyzer = CorrAnalyzer(
       df,
       method='spearman',
       missing_strategy='fill',
       fill_method='median'
   )
   
   result = analyzer.fit()
   analyzer.plot_heatmap()

数据分析流程
============

完整分析流程
------------

.. code-block:: python

   import pandas as pd
   from pycorrana import CorrAnalyzer

   df = pd.read_csv('data.csv')
   
   analyzer = CorrAnalyzer(df, verbose=True)
   
   analyzer.preprocess()
   
   analyzer.compute_correlation(target='target_column')
   
   result = {
       'correlation_matrix': analyzer.corr_matrix,
       'pvalue_matrix': analyzer.pvalue_matrix,
       'significant_pairs': analyzer.significant_pairs
   }
   
   analyzer.plot_heatmap(figsize=(14, 12), cluster=True)
   
   analyzer.export_results('results.xlsx')

分步分析
--------

.. code-block:: python

   from pycorrana import CorrAnalyzer
   from pycorrana.utils.data_utils import infer_types, handle_missing

   type_mapping = infer_types(df)
   print("数据类型:", type_mapping)
   
   df_clean = handle_missing(df, strategy='fill', fill_method='mean')
   
   analyzer = CorrAnalyzer(df_clean, method='auto')
   
   result = analyzer.fit(columns=['var1', 'var2', 'var3', 'target'])
   
   print(analyzer.summary())

可视化示例
==========

自定义热力图
------------

.. code-block:: python

   from pycorrana import CorrAnalyzer

   analyzer = CorrAnalyzer(df)
   analyzer.fit()
   
   analyzer.plot_heatmap(
       figsize=(16, 14),
       annot=True,
       fmt='.2f',
       cmap='coolwarm',
       center=0,
       vmin=-1,
       vmax=1,
       linewidths=0.5,
       cluster=True,
       cluster_method='average',
       savefig='heatmap.png',
       dpi=300
   )

散点图矩阵
----------

.. code-block:: python

   analyzer.plot_pairplot(
       columns=['age', 'income', 'education', 'score'],
       hue='gender',
       diag_kind='kde',
       corner=True,
       savefig='pairplot.png'
   )

分组箱线图
----------

.. code-block:: python

   analyzer.plot_boxplot(
       numeric_col='salary',
       categorical_col='department',
       kind='violin',
       savefig='salary_by_dept.png'
   )

相关网络图
----------

.. code-block:: python

   analyzer.visualizer.plot_correlation_network(
       analyzer.corr_matrix,
       threshold=0.4,
       node_size=1000,
       layout='circular',
       savefig='network.png'
   )

显著相关对条形图
----------------

.. code-block:: python

   analyzer.visualizer.plot_significant_pairs(
       analyzer.significant_pairs,
       top_n=15,
       savefig='top_correlations.png'
   )

偏相关分析示例
==============

控制单个协变量
--------------

.. code-block:: python

   from pycorrana import partial_corr

   r, p = partial_corr(
       df,
       x='income',
       y='happiness',
       covars='age'
   )
   
   print(f"偏相关系数: {r:.4f}, p值: {p:.4f}")

控制多个协变量
--------------

.. code-block:: python

   from pycorrana import partial_corr

   covars = ['age', 'education', 'gender', 'location']
   
   r, p = partial_corr(
       df,
       x='income',
       y='happiness',
       covars=covars
   )

偏相关矩阵
----------

.. code-block:: python

   from pycorrana import partial_corr_matrix

   matrix = partial_corr_matrix(
       df,
       covars=['age'],
       columns=['income', 'health', 'happiness', 'social']
   )
   
   print(matrix)

使用 PartialCorrAnalyzer
------------------------

.. code-block:: python

   from pycorrana import PartialCorrAnalyzer

   analyzer = PartialCorrAnalyzer(df, covars=['age', 'education'])
   
   result = analyzer.fit(x='income', y='happiness')
   
   matrix = analyzer.compute_matrix(columns=['income', 'health', 'happiness'])

非线性分析示例
==============

距离相关
--------

.. code-block:: python

   from pycorrana import distance_correlation
   import numpy as np

   x = np.random.randn(100)
   y = x ** 2 + np.random.randn(100) * 0.1
   
   dcor = distance_correlation(x, y)
   print(f"距离相关系数: {dcor:.4f}")

互信息分析
----------

.. code-block:: python

   from pycorrana import mutual_info_score

   mi = mutual_info_score(df['feature1'], df['feature2'])
   print(f"互信息: {mi:.4f}")

最大信息系数
------------

.. code-block:: python

   from pycorrana import maximal_information_coefficient

   mic = maximal_information_coefficient(df['x'], df['y'])
   print(f"MIC: {mic['mic']:.4f}")

.. note::

   **性能说明**：当前 MIC 实现为纯 Python 版本，计算速度较慢。对于大数据集，建议先采样：

   .. code-block:: python

      from pycorrana.utils import smart_sample

      # 采样后再计算 MIC
      sampled_df = smart_sample(df, sample_size=500)
      mic = maximal_information_coefficient(sampled_df['x'], sampled_df['y'])

非线性依赖报告
--------------

.. code-block:: python

   from pycorrana import nonlinear_dependency_report

   report = nonlinear_dependency_report(
       df,
       top_n=20,
       methods=['dcor', 'mic']
   )
   
   print(report)

使用 NonlinearAnalyzer
----------------------

.. code-block:: python

   from pycorrana import NonlinearAnalyzer

   analyzer = NonlinearAnalyzer(df)
   
   result = analyzer.analyze_all(top_n=10)
   
   analyzer.plot_nonlinear_pairs(savefig='nonlinear.png')

示例数据集使用
==============

鸢尾花数据集
------------

.. code-block:: python

   from pycorrana import load_iris, quick_corr

   iris = load_iris()
   
   result = quick_corr(iris, target='species')

泰坦尼克数据集
--------------

.. code-block:: python

   from pycorrana import load_titanic, CorrAnalyzer

   titanic = load_titanic()
   
   analyzer = CorrAnalyzer(
       titanic,
       missing_strategy='fill',
       fill_method='median'
   )
   
   result = analyzer.fit(target='survived')

葡萄酒数据集
------------

.. code-block:: python

   from pycorrana import load_wine, quick_corr

   wine = load_wine()
   
   result = quick_corr(wine, plot=True)

生成模拟数据
------------

.. code-block:: python

   from pycorrana import make_correlated_data, CorrAnalyzer

   df = make_correlated_data(
       n_samples=500,
       n_features=8,
       correlation_strength=0.6,
       noise_level=0.2
   )
   
   analyzer = CorrAnalyzer(df)
   result = analyzer.fit()
   analyzer.plot_heatmap(cluster=True)

典型相关分析示例
================

基本 CCA 分析
-------------

.. code-block:: python

   from pycorrana import cca, load_iris

   df = load_iris()
   
   # 定义两组变量
   X = df[['sepal_length', 'sepal_width']]
   Y = df[['petal_length', 'petal_width']]
   
   # 执行典型相关分析
   result = cca(X, Y)
   
   print("典型相关系数:", result['canonical_correlations'])
   # 输出: [0.9409, 0.1222]

查看详细结果
------------

.. code-block:: python

   result = cca(X, Y)
   
   # 典型相关系数
   print("典型相关系数:")
   for i, r in enumerate(result['canonical_correlations']):
       print(f"  第 {i+1} 对: {r:.4f}")
   
   # X 变量的典型系数
   print("\nX 变量典型系数:")
   print(result['x_weights'])
   
   # Y 变量的典型系数
   print("\nY 变量典型系数:")
   print(result['y_weights'])
   
   # 显著性检验
   print("\n显著性检验:")
   for test in result['significance_tests']:
       print(f"  典型相关 {test['canonical_index'] + 1}: "
             f"Wilks' λ = {test['wilks_lambda']:.4f}, "
             f"p = {test['p_value']:.4f}")

置换检验
--------

.. code-block:: python

   from pycorrana import cca_permutation_test

   result = cca_permutation_test(
       X, Y,
       n_permutations=1000,
       random_state=42
   )
   
   print("原始典型相关系数:", result['canonical_correlations'])
   print("置换检验 p 值:", result['permutation_pvalues'])

使用 CCAAnalyzer 类
-------------------

.. code-block:: python

   from pycorrana import CCAAnalyzer

   analyzer = CCAAnalyzer()
   result = analyzer.fit(X, Y)
   
   # 获取典型变量得分
   scores_x, scores_y = analyzer.get_scores(X, Y)
   
   # 典型变量相关性
   print("典型变量得分相关性:")
   print(scores_x.corrwith(scores_y))

实际应用示例
------------

分析心理健康数据：

.. code-block:: python

   from pycorrana import cca
   import pandas as pd

   df = pd.read_csv('psychology_data.csv')
   
   # 心理测量变量
   psychological = df[['anxiety', 'depression', 'stress']]
   
   # 生理测量变量
   physiological = df[['heart_rate', 'blood_pressure', 'cortisol']]
   
   result = cca(psychological, physiological)
   
   print("心理-生理典型相关系数:", result['canonical_correlations'])
   
   # 解读第一对典型变量
   print("\n心理变量权重:", result['x_weights'][:, 0])
   print("生理变量权重:", result['y_weights'][:, 0])

高级用法
========

自定义分析流程
--------------

.. code-block:: python

   from pycorrana import CorrAnalyzer
   from pycorrana.utils.data_utils import load_data, infer_types
   from pycorrana.utils.stats_utils import correct_pvalues

   df = load_data('data.csv')
   
   type_mapping = infer_types(df)
   numeric_cols = [k for k, v in type_mapping.items() if v == 'numeric']
   
   analyzer = CorrAnalyzer(df[numeric_cols], method='pearson')
   result = analyzer.fit()
   
   pvalues = result['pvalue_matrix'].values.flatten()
   pvalues = pvalues[~np.isnan(pvalues)]
   corrected = correct_pvalues(pvalues.tolist(), method='bonferroni')

批量处理多个文件
----------------

.. code-block:: python

   import os
   from pycorrana import quick_corr

   data_dir = 'data/'
   output_dir = 'results/'
   
   for filename in os.listdir(data_dir):
       if filename.endswith('.csv'):
           filepath = os.path.join(data_dir, filename)
           output_path = os.path.join(output_dir, f'{filename}_results.xlsx')
           
           result = quick_corr(
               filepath,
               export=output_path,
               plot=False,
               verbose=False
           )
           
           print(f"Processed: {filename}")

结合 pandas 分析
----------------

.. code-block:: python

   import pandas as pd
   from pycorrana import CorrAnalyzer

   df = pd.read_csv('data.csv')
   
   grouped = df.groupby('category')
   
   for name, group in grouped:
       print(f"\n=== Group: {name} ===")
       analyzer = CorrAnalyzer(group)
       result = analyzer.fit()
       print(analyzer.summary())